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Abstract 

Chaotic  systems  have  received  much  attention  in  the  mathematics  and  physics  communities 
in  the  last  two  decades;  and  they  are  receiving  increasing  attention  in  various  engineering 
disciplines  as  well.  Experimental  evidence  suggests  that  these  systems  may  be  useful  models 
for  a  wide  variety  of  physical  phenomena,  including  turbulence,  vibrations  of  buckled  elastic 
structures,  and  behavior  of  certain  feedback  control  devices. 

This  thesis  deals  with  both  the  analysis  and  synthesis  of  chaotic  maps  and  time- sampled 
chaotic  flows,  with  a  focus  on  the  problems  and  issues  that  arise  with  noise- corrupted  orbit 
segments  generated  by  these  maps  and  flows.  Both  dissipative  and  nondissipative  systems 
are  considered,  with  both  types  of  systems  considered  in  the  context  of  analysis  and  the 
latter  type  also  considered  in  the  context  of  synthesis.  With  respect  to  dissipative  systems, 
three  probabilistic  state  estimation  algorithms  are  introduced  and  applied  to  three  problem 
scenarios,  with  the  scenarios  distinguished  by  the  amount  of  a  priori  knowledge  of  the 
dynamics  of  the  underlying  chaotic  system. 

Cramer-Rao,  Barankin,  and  Weiss- Weinstein  upper  bounds  on  state  estimator  perfor¬ 
mance  are  derived  and  both  experimentally  and  qualitatively  analyzed.  The  analysis  reveals 
that  intrinsic  properties  of  chaotic  systems — positive  Lyapunov  exponents  and  boundedness 
of  attractors — have  a  fundamental  influence  on  achievable  state  estimator  performance  with 
these  systems. 

With  respect  to  nondissipative  systems,  the  thesis  considers  a  class  of  piecewise  linear 
maps  of  the  unit  interval,  members  of  which  give  rise  to  finite-state,  homogeneous  Markov 
chains.  The  thesis  establishes  ergodic  and  other  properties  of  these  maps  and  explores  the 
use  of  these  maps  as  generators  of  signals  for  practical  applications.  A  close  relation  is 
established  between  noise-corrupted  orbit  segments  generated  by  the  maps  and  outputs  of 
hidden  Markov  models,  and  this  relation  is  exploited  in  practical,  optimal  and  suboptimal 
algorithms  for  detection,  parameter  estimation,  and  state  estimation  with  the  maps. 

Thesis  Supervisor:  Alan  V.  Oppenheim 

Title:  Distinguished  Professor  of  Electrical  Engineering 
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Chapter  1 


Introduction 


Chaotic  systems  have  received  much  attention  in  the  mathematics  and  physics  communities 
in  the  last  two  decades;  and  they  are  receiving  increasing  attention  in  various  engineering 
disciplines  as  well.  Experimental  evidence  suggests  that  these  systems  may  be  useful  models 
for  a  wide  variety  of  physical  phenomena,  including  turbulence,  vibrations  of  buckled  elastic 
structures,  and  behavior  of  certain  feedback  control  devices.  Traditionally,  researchers  have 
focused  on  possible  causes  of,  or  transitions  to  chaos,  universal  properties  shared  by  chaotic 
systems,  and  various  topological  and  ergodic  properties  of  chaotic  systems.  As  such,  the 
emphasis  has  been  on  the  analysis  of  chaotic  systems  and  real-world  systems  suspected  of 
exhibiting  chaos.  There  has  been  little  attention  given  to  the  synthesis  of  chaotic  signals 
and  systems  for  practical  engineering  applications  such  as  in  communication  systems.  In 
part  for  this  reason,  useful  engineering  applications  of  chaotic  systems  have  yet  to  appear. 

This  thesis  considers  both  the  analysis  and  synthesis  of  chaotic  signals  and  systems. 
The  unifying  theme  of  this  thesis  is  additive  noise  and  the  problems  and  issues  involved  in 
dealing  with  chaotic  signals  embedded  in  additive  noise.  With  respect  to  analysis,  the  thesis 
focuses  on  estimating  the  state  of  discrete-time,  chaotic  systems  based  on  noise-corrupted 
observations  of  the  state. 

The  problem  of  noise-corrupted  chaotic  signals  arises  in  many  applications  as  does  the 
need  for  effective  state  estimation  algorithms.  For  example,  often  when  measurements  are 
taken  of  a  physical  phenomenon  suspected  of  being  chaotic,  the  measuring  device  introduces 
error  in  the  recorded  signal,  with  the  error  well-modeled  as  additive  white  noise.  Alterna¬ 
tively,  the  actual  underlying  physical  phenomenon  may  be  immersed  in  a  noisy  environment, 
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as  might  be  the  case  if  one  seeks  to  intercept  a  low-power,  chaotic  signal,  possibly  used  for 
secure  communication,  that  has  been  transmitted  over  a  noisy  channel.  In  both  cases, 
one  seeks  to  separate  the  chaotic  signal  from  the  noise  and  often  to  estimate  the  state  of 
the  underlying  chaotic  system  from  the  noise-corrupted  observations  of  the  transformed 
state.  The  thesis  considers  three  problem  scenarios  involving  noise-corrupted  observations 
of  chaos,  with  the  scenarios  distinguished  by  the  level  of  a  priori  knowledge  of  the  dynamics 
of  the  underlying  chaotic  system.  State  estimation  algorithms  are  introduced  for  each  of  the 
scenarios,  and  the  performance  of  the  algorithms  evaluated  using  Monte  Carlo  simulations. 

When  attempting  to  design  and  refine  estimators  for  nonlinear  estimation  problems,  one 
often  has  no  way  of  knowing  if  mediocre  performance  of  an  estimator  is  due  to  the  estimator 
or  to  a  fundamental  aspect  of  the  problem  itself.  As  such,  it  is  often  useful  and  desirable 
to  know  the  best  performance  achievable  by  any  estimator  for  a  given  nonlinear  estimation 
problem,  or  equivalently  to  have  upper  bounds  on  achievable  state  estimator  performance. 
Consequently,  state  estimator  performance  bounds  are  derived  and  analyzed  in  the  thesis. 
The  analysis  reveals  that  intrinsic  properties  of  chaotic  systems — positive  Lyapunov  ex¬ 
ponents  and  boundedness  of  attractors — have  a  fundamental  influence  on  achievable  state 
estimator  performance  with  these  systems. 

With  respect  to  synthesis,  the  thesis  introduces  and  analyzes  a  class  of  chaotic  building 
blocks  having  potential,  practical  value.  The  basic  elements  of  this  class  are  piecewise  linear 
maps  of  the  unit  interval  which  satisfy  certain  constraints.  These  simple  maps  are  shown  to 
exhibit  a  rich  set  of  properties,  which  among  other  things  allows  computationally  efficient, 
optimal  and  suboptimal  detection  as  well  as  maximum-likelihood  state  estimation  with  the 
maps. 

The  thesis  is  organized  as  follows.  Chapters  2  and  3  respectively  provide  background 
material  on  chaos  and  the  general  state  estimation  problems  considered  in  this  thesis.  Specif¬ 
ically,  Chapter  2  discusses  a  number  of  topological  and  ergodic  properties  often  associated 
with  chaos,  the  relations  among  these  properties,  as  well  as  the  concept  of  dissipative 
chaos.  Chapter  3  begins  by  defining  the  state  estimation  scenarios  of  interest  in  this  the¬ 
sis  and  continues  by  briefly  reviewing  the  fundamentals  of  probabilistic  state  estimation 
with  an  emphasis  on  the  two  probabilistic,  state  estimation  approaches  focused  on  the 
thesis:  Maximum-Likelihood  (ML)  and  Minimum-Mean- Squared-Error  (MMSE).  Next,  the 
optimal  MMSE  state  estimator  for  linear,  dynamical  systems — the  Kalman  filter — is  dis- 
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cussed  as  well  as  the  practical  difficulties  in  performing  optimal  MMSE  state  estimation 
with  nonlinear,  dynamical  systems.  Finally,  the  chapter  provides  a  summary  of  previous 
state  estimation  research  involving  nonlinear  dynamical  systems,  in  general,  followed  by  a 
more  focused  summary  of  previous  state  estimation  research  involving  deterministic,  chaotic 
systems. 

Chapter  4  introduces  state  estimation  algorithms  for  discrete-time  chaotic  systems  and 
time-sampled,  continuous-time  chaotic  systems  and  provides  performance  results  obtained 
via  Monte  Carlo  simulations  with  several  chaotic  systems.  State  estimation  algorithms  are 
introduced  for  three  problem  scenarios — known  system  dynamics,  unknown  system  dynam¬ 
ics  and  availability  of  a  noise-free  reference  orbit,  unknown  system  dynamics  and  nonavail¬ 
ability  of  a  noise-free  reference  orbit.  In  particular,  the  extended  Kalman  filter  is  shown 
to  yield  mediocre  performance  results,  but  a  new  variation  of  this  filter  is  shown  to  be  a 
potentially  effective  state  estimator  for  chaotic  systems,  even  when  the  system  dynamics  are 
unknown.  A  global,  approximate  MMSE  state  estimator  is  also  introduced  and  is  shown  to 
be  a  potentially  effective  state  estimator  with  input  SNRs  as  small  as  0  dB.  The  estimator 
exploits  intrinsic  properties  of  the  steady-state  behavior  of  dissipative,  chaotic  systems,  in 
particular  topological  transitivity  and  ergodicity. 

Chapter  5  derives  performance  bounds  for  a  general  class  of  state  estimators  and  in¬ 
terprets  these  bounds  in  the  context  of  chaos.  The  Cramer- Rao  bound  for  estimators  of 
nonrandom  state  vectors  of  deterministic  chaotic  systems  is  derived  and  shown  to  exhibit 
behavior  similar  to  that  of  the  corresponding  bound  for  unstable,  linear  systems.  In  par¬ 
ticular,  for  dissipative,  chaotic  diffeomorphisms  the  bound  exhibits  a  nonzero  asymptotic 
limit  when  there  are  noisy  observations  of  the  state  for  only  past  or  only  future  times.  Lim¬ 
itations  of  the  Cramer- Rao  bound  are  discussed,  and  a  specialized  form  of  a  general  class 
of  performance  bounds,  the  Barankin  bounds,  is  shown  to  overcome  these  limitations.  The 
Cramer-Rao  and  Weiss- Weinstein  bounds  for  estimators  of  random  state  vectors  are  also 
briefly  considered  and  shown  to  be  of  little  value  for  use  with  chaotic  systems. 

In  contrast  to  Chapters  4  and  5  which  focus  on  the  analysis  of  chaotic  systems  and  signals 
produced  by  these  systems,  Chapters  6  and  7  focus  on  the  synthesis  of  chaotic  systems.  In 
particular,  Chapter  6  introduces  a  class  of  piecewise  linear  maps  of  the  unit  interval  which 
give  rise  to  finite  state  Markov  chains.  New  and  previously  reported  properties  of  these 
maps  are  discussed,  and  a  close  relation  established  between  the  ergodic  properties  of  these 
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maps  and  the  Markov  chains  they  give  rise  to.  In  addition,  these  maps  are  shown  to  be 
potentially  useful  building  blocks  for  maps  with  arbitrary,  invariant  probability  density 
functions  and  for  higher- dimensional  maps  which  also  give  rise  to  Markov  chains.  Chapter 
7  derives  computationally  efficient  optimal  and  suboptimal  detectors  for  a  subset  of  these 
maps  and  briefly  discusses  the  potential  value  of  these  maps  and  associated  detectors  for 
secure  communication.  The  chapter  concludes  by  introducing  optimal  and  suboptimal  ML 
state  estimators  for  use  with  these  maps. 

Finally,  Chapter  8  provides  concluding  remarks.  The  chapter  begins  by  summarizing 
the  highlights  of  the  thesis  as  well  its  major  contributions  to  the  research  community.  The 
chapter  then  discusses  potentially  fruitful  topics  for  future  research,  which  build  on  the 
results  presented  in  the  thesis. 
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Chapter  2 


Overview  of  Chaos 


2.1  Introduction 

This  chapter  establishes  a  foundation  for  the  state  estimation  algorithms  introduced  in 
Chapter  4  and  the  performance  bounds  derived  in  Chapter  5,  by  providing  a  brief  intro¬ 
duction  to  chaos.  We  discuss  a  number  of  properties,  both  topological  and  ergodic,  often 
associated  with  chaos  and  explain  such  concepts  as  invariant  measures,  attractors,  Lya¬ 
punov  exponents,  topological  transitivity,  and  sensitive  dependence  on  initial  conditions. 
We  omit  discussion  of  several  concepts  relevant  to  chaos  but  not  relevant  to  this  thesis,  in¬ 
cluding  dimensions  and  entropies  of  attractors.  These  topics  are  discussed  at  varying  levels 
of  detail  in  a  number  of  references  on  nonlinear  dynamical  systems,  (e.g.,  [60,  69]). 

The  discussion  of  topological  and  ergodic  properties  in  Section  2.3  is  rather  formal 
and  abstract,  and  it  uses  a  number  of  theoretical  concepts  from  mathematical  analysis 
and  topology.  Although  many  of  the  terms  introduced  in  the  section  are  used  throughout 
the  thesis,  a  thorough  understanding  of  the  theory  underlying  these  terms  is  not  needed 
in  order  to  understand  the  information  on  state  estimation  algorithms  and  performance 
bounds  provided  in  Chapters  4  and  5. 

2.2  Notational  Conventions 

We  adopt  the  following  notational  conventions  in  this  thesis.  Plain  lowercase  and  uppercase 
letters  such  as  x  and  /  denote  scalars  and  scalar-valued  functions,  whereas  bold  lowercase 
and  uppercase  letters  such  as  x  and  /  denote  vectors  and  vector-valued  functions.  Except 
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when  confusion  might  result,  functions  and  arguments  of  functions  are  denoted  in  the  text 
simply  by  the  symbol  representing  the  function  or  argument.  For  example,  for  the  scalar 
equation  y  =  f(x),  we  say  that  x  is  the  argument  of  the  function  /  and  y  is  the  value  of  / 
evaluated  at  x,  or  equivalently  the  image  of  x  under  /. 

The  character  1Z  denotes  the  real  line  and  1Zn  denotes  Euclidean  n- space.  Unless  stated 
otherwise,  all  vectors  have  real-valued  components  and  the  domain  and  range  of  each  func¬ 
tion  are  subsets  of  7 Zn  and  TZm,  respectively,  for  positive  integers  m  and  n.  The  time  index 
for  a  time  series  of  scalars  or  vectors  is  given  in  parentheses,  so  for  example  x(n )  denotes 
the  element  of  the  time  series  {z(i)}  at  time  n.  Given  two  functions  g  and  h,  g  o  h  de¬ 
notes  the  composition  of  g  with  ft,  so  that  (g  o  h)(x)  =  g(h(x)).  Similarly,  the  shorthand 
notation  fn(x )  =  (/  o  .  „  o  f)(x)  denotes  the  composition  of  /  with  itself  n  times,  and  by 

n  times 

definition  /°(x)  =  x.  We  let  /-1  denote  the  inverse  or  inverse  image  of  /,  and  /~n,  the 
n-fold  composition  of  f~x.  For  a  scalar- valued,  differentiable  function  /,  /'(x)  denotes  the 
derivative  of  /  evaluated  at  x,  and  more  generally  f(n\x)  denotes  the  nt/l-derivative  of  / 
evaluated  at  x.  Similarly,  for  vector- valued,  differentiable  functions  /  with  vector- valued 
arguments,  D{f(x)}  denotes  the  derivative  of  f  with  respect  to  x,  and  if  the  derivative 
is  not  taken  with  respect  to  the  innermost  argument,  we  use  a  subscript  on  D  to  denote 
the  differentiation  variable.  For  example,  D{f(g(x))}  denotes  the  derivative  of  fog  taken 
with  respect  to  x  whereas  Dg(x){f(g(x))}  denotes  the  derivative  of  /  taken  with  respect 
to  g(x). 

We  let  log(x)  denote  the  natural  logarithm  of  x  and  log10(x)  denote  the  logarithm  to 
the  base  10  of  x.  Except  when  there  might  be  confusion,  we  use  a  shorthand  notation  for 
probability  density  functions  (PDFs).  Specifically,  p(x)  and  p{y)  respectively  denote  the 
PDFs  px(x)  and  priy)  (evaluated  at  their  arguments),  and  p{y\x)  denotes  PY\x{y\x)i  the 
PDF  of  y  conditioned  on  the  random  variable  x.  In  addition,  we  use  the  shorthand  notation 
/  fdp  to  represent  the  Lebesgue  integral  of  the  function  /  with  respect  to  the  measure  p. 

The  focus  in  this  thesis  is  on  discrete-time,  deterministic,  dynamical  systems,  also  known 
as  deterministic  maps,  and  time-sampled,  continuous- time,  deterministic,  dynamical  sys¬ 
tems,  also  known  as  deterministic  flows.  By  a  deterministic  map,  we  mean  an  evolution 
equation  specified  by  a  function  f  :  M  M  mapping  some  space  M  to  itself,  which  gives 
rise  to  a  time  series  {*(i)}  satisfying  the  relation  xin  +  1)  =  f(x(n))  =  /n(*(0))  and  if 
/  is  invertible  the  relation  x(0)  =  f~n(x(n)).  The  last  two  equalities  emphasize  the  fact 
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that  for  a  deterministic  system,  the  value  of  the  time  series  at  any  time  uniquely  determines 
the  value  at  all  future  times  and  at  all  past  times  as  well  if  the  system  is  invertible.  We 
refer  to  /  as  a  map,  the  time  series  {*(0)  as  an  orbit ,  and  x(n)  as  either  the  state  of  f 
at  time  n  or  the  orbit  point  at  time  n.  To  maintain  an  absolute  time  reference,  we  reserve 
the  designation  initial  state  or  initial  condition  for  x(0),  the  state  at  time  0,  even  if  our 
interest  is  with  x(n)  for  times  n  <  0.  We  also  define  an  N -point  orbit  segment  to  be  a  set 
of  N  consecutive  points  from  an  orbit,  and  the  orbit  generated  by  ®(0)  to  be  the  unique 
one-sided  orbit  for  noninvertible  maps  and  the  unique  two-sided  orbit  {sc(0)Soo 

for  invertible  maps  associated  with  a  specific  point  ®(0)  at  time  0.  An  orbit  0  is  periodic 
with  period  n  if  for  each  x  6  0,  (/”)'(cc)  =  x  for  all  nonnegative  integers  i  and  some 
positive  integer  n.  Note  that  a  periodic  orbit  has  only  a  finite  number  of  unique  points. 
We  refer  to  points  on  periodic  orbits  as  periodic  points.  An  orbit  point  x  is  asymptotically 
periodic  if  there  exists  a  positive  integer  N  such  that  the  point  fN[x )  is  periodic. 

By  a  deterministic  flow,  we  mean  an  evolution  equation  specified  by  a  function  F  : 
M  — ►  M  mapping  some  space  M  to  itself,  which  gives  rise  to  continuous-time  waveforms 
x(t),  t  £  R  which  satisfy  the  differential  equation 

§  =  At)  =  F(x(t)).  (2.1) 

In  this  thesis  we  are  primarily  interested  in  the  maps  which  flows  give  rise  to  by  sampling 
x(t)  every  T  seconds.  That  is,  the  resulting  time  series  (y(i)},  where  y{n)  =  x(nT),  is  an 
orbit  of  the  deterministic  map  /y  defined  as 

r(n+l)T 

®((n  +  1)T)  =  fT(x(nT))  =  x(nT)  +  /  F(x(t))  dt.  (2.2) 

JnT 

Note  that  in  contrast  to  arbitrary  maps  which  may  or  may  not  be  invertible,  the  maps 
which  time-sampled  flows  give  rise  to  are  always  invertible  if  F  is  continous. 


2.3  The  Distinguishing  Properties  of  Chaos 

Deterministic  chaotic  systems,  referred  to  simply  as  chaotic  systems  in  the  thesis,  are  nonlin¬ 
ear,  deterministic,  dynamical  systems,  either  discrete-time  or  continuous-time,  which  satisfy 
a  certain  set  of  properties;  but,  there  is  no  universal  agreement  as  to  what  these  properties 
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should  be.  As  a  result,  there  is  no  single  definition  of  deterministic  chaos,  but  instead 
several,  closely  related  definitions.  In  this  section  we  discuss  topological  and  ergodic  prop¬ 
erties  often  associated  with  or  required  for  deterministic  chaos,  discuss  the  relation  among 
these  properties,  and  cite  those  properties  either  satisfied  by  or  believed  to  be  satisfied  by 
the  systems  considered  in  this  thesis.  We  only  discuss  properties  relevant  to  discrete-time 
and  time-sampled  continuous-time  systems.  Topological  properties  are  discussed  first,  since 
they  are  conceptually  easier  to  understand  than  ergodic  properties. 

2.3.1  Topological  Properties 

To  discuss  the  topological  properties  of  chaos,  one  must  first  specify  a  topology  [62]  on  the 
space  M  on  which  the  system  is  defined.  For  the  unit-interval  maps  considered  in  Chapters 
6  and  7,  M  is  the  unit  interval  and  the  topology  on  M  is  the  subspace  topology  for  the 
metric  topology  on  TZn.  In  other  words,  a  basis  element  for  the  topology  is  simply  the 
intersection  of  an  open  ball  in  TZn  with  M.  For  the  dissipative  maps  considered  in  Chapters 
4  and  5,  M  and  its  associated  topology  are  not  as  easily  defined,  since  of  interest  is  the 
steady-state  dynamics  of  such  systems  and  these  dynamics  typically  evolve  onto  attractors 
which  are  not  simple  subsets  of  TZn.  In  the  discussion  that  follows,  we  implicitly  assume 
that  an  appropriately  defined  metric  gives  rise  to  the  underlying  topology. 

The  three  topological  properties  often  required  of  a  dynamical  system  for  it  to  be  con¬ 
sidered  chaotic  are  sensitive  dependence  on  initial  conditions,  topological  transitivity,  and 
a  dense  set  of  periodic  orbits  [20].  We  briefly  consider  each  in  turn. 

Most  definitions  of  chaos  require  that  there  be  sensitive  dependence  on  initial  conditions, 
a  formal  definition  of  which  is  the  following  [20]: 

Sensitive  Dependence  on  Initial  Conditions:  A  discrete-time  system  or  map  /  :  M  — *■ 
M  has  sensitive  dependence  on  initial  conditions  if  there  exists  a  constant  6  >  0  such  that 
for  any  x  £  M  and  any  neighborhood  U  of  x,  there  exists  a  y  6  U  and  an  integer  n  >  0 
such  that  | /n(®)  -  fn{y) |  >  6. 

In  other  words,  for  there  to  be  sensitive  dependence  on  initial  conditions,  there  must  be 
a  positive  constant  6,  such  that  for  any  point  x  in  M  and  any  arbitrarily  small  open  ball 
containing  x,  one  can  always  find  another  point  y  in  that  ball  such  that  the  distance  between 
corresponding  points  on  the  orbits  generated  by  *  and  y  eventually  exceeds  S.  Intuitively, 
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this  means  that  there  is  a  local  separation  or  divergence  of  the  two  orbits.  However,  the 
conditions  of  the  definition  are  still  satisfied  even  if  these  orbits  converge  after  time  n. 
Also,  the  definition  does  not  require  that  the  orbits  generated  by  all  points  in  a  given 
neighborhood  of  x  diverge  from  that  generated  by  x.  In  fact,  for  many  chaotic  systems 
such  as  the  dissipative  systems  considered  in  Chapter  4,  one  can  generally  find  points  in 
each  neighborhood  of  x  for  which  the  orbits  generated  by  these  points  converge  to  the  one 
generated  by  x. 

An  unstable,  linear  system  exhibits  sensitive  dependence  on  initial  conditions,  but  such 
a  system  is  not  considered  chaotic.  As  such,  additional  topological  properties  are  generally 
required  of  a  system  for  it  to  be  considered  chaotic.  One  such  property,  not  shared  by  linear 
systems  unless  the  space  M  consists  only  of  the  origin,  is  topological  transitivity ,  which  can 
be  defined  as  follows  [20]: 

Topological  Transitivity:  A  map  /  :  M  — »  M  is  topologically  transitive  if  for  any  pair 
of  open  sets  U,V  C  M,  there  exists  a  positive  integer  k  such  that  fk(U )  D  V  ^  0  where  0 
is  the  empty  set. 


A  less  abstract  definition,  applicable  when  /  is  continuous  and  M  is  compact,  is  the  following 
[87]: 


Topological  Transitivity:  A  continuous  transformation  f  :  M  —*■  M  is  topologically 
transitive  if  there  is  some  x  €  M  for  which  the  orbit  generated  by  x,  Of(x)  =  {fk(x)\k  >  0}, 
is  dense  in  M .  That  is,  for  any  y  €  M  and  any  open  set  U  containing  y,  there  exists  a 
positive  integer  k  such  that  fk{x)  e  U . 


Intuitively,  this  latter  definition  simply  means  that  the  orbit  generated  by  x  comes  arbitrar¬ 
ily  close  to  every  point  in  M.  (More  precisely,  the  above  definitions  are  those  for  one-sided 
topological  transitivity  as  opposed  to  two-sided  topological  transitivity,  a  concept  applicable 
only  to  invertible  systems). 

One  consequence  of  topological  transitivity  is  that  it  prevents  M  from  being  decom¬ 
posable.  That  is,  if  f  is  topologically  transitive  on  M,  then  one  can’t  divide  M  into  two 
disjoint  subsets  Mi  and  M2,  such  that  /(Mj)  C  Mx  and  /(M2)  C  M2.  As  suggested  by  this 
indecomposability  constraint  and  discussed  in  the  next  section,  ergodicity  and  topological 
transitivity  are  closely  related.  In  addition,  if  f  is  topologically  transitive  on  M,  one  can 
show  that  if  one  point  has  a  dense  orbit  then  almost  all  points  in  M  have  dense  orbits  [87]. 
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In  Chapter  4  we  exploit  this  property  of  chaotic  systems  to  derive  simple,  yet  potentially 
effective  state  estimation  algorithms. 

A  third  topological  property  sometimes  required  of  a  system  for  it  to  be  considered 
chaotic  is  that  there  be  a  dense  set  of  periodic  points.  Although  many  if  not  all  of  the 
systems  considered  in  this  thesis  have  this  property,  it  is  not  a  property  which  we  either 
exploit  or  emphasize.  We  do  not  consider  it  in  the  context  of  state  estimation  in  Chapters 
4  and  5  and  only  briefly  consider  in  the  context  of  signal  synthesis  in  Chapters  6  and  7. 

2.3.2  Ergodic  Properties 

Whereas  a  discussion  of  the  topological  properties  associated  with  chaos  requires  that  a 
topology  be  first  defined,  a  discussion  of  the  ergodic  properties  associated  with  chaos  requires 
that  a  measure  space  ( X,0,p )  be  defined,  where  X  is  a  set,  0  is  a  cr-algebra  of  subsets  of 
X,  and  p  is  a  measure  on  0.  We  only  consider  probability  spaces  in  this  thesis,  which  are 
those  spaces  satisfying  p{X)  =  1.  One  example  of  a  probability  space  used  extensively 
in  Chapters  6  and  7  consists  of  X  being  the  unit  interval,  0  being  the  Borel  cr-algebra 
(which  by  definition  is  the  smallest  a-algebra  containing  all  open  subintervals  of  the  unit 
interval),  and  p  being  Lebesgue  measure  (the  unique  measure  for  which  the  measure  of  any 
subinterval  is  simply  the  length  of  the  subinterval).  In  contrast,  for  the  dissipative  maps 
considered  in  Chapters  4  and  5,  the  underlying  probability  space  is  rather  nebulous,  as  the 
steady-state  system  dynamics  occur  on  a  complicated  (fractal)  attractor,  which  is  singular 
with  respect  to  Lebesgue  measure. 

A  transformation  or  map  f  :  X  -*  X  (i.e.,  a  mapping  from  X  to  itself)  is  measurable 
if  f~1(B)  6  0  for  every  B  e  0,  where  /_1  denotes  the  inverse  image  of  /.  A  measurable 
transformation  is  nonsingular  if  p(f~1(B))  =  0  for  every  B  €  0  such  that  p(B)  =  0. 
A  nonsingular,  measurable  transformation  is  measure-preserving  if  p(f~1(B))  =  p(B)  for 
every  B  E  0.  If  in  addition  to  being  measure-preserving,  the  map  has  the  property  that 
the  only  sets  B  in  0  for  which  /-1(jB)  =  B  have  measure  zero  or  one,  then  f  is  ergodic 
(with  respect  to  the  measure  p).  In  other  words,  an  ergodic  map  is  one  for  which  the  only 
invariant  sets  are  those  that  contain  either  all  the  probability  or  none  of  it. 

For  probability  spaces,  one  consequence  of  ergodicity  is  that  ensemble  averages  and 
infinite  time  averages  correspond,  in  the  sense  that  the  following  relation  holds  for  all 
functions  g  E  LJ(p)  (where  Ll(p)  denotes  the  set  of  all  absolutely  integrable,  real- valued 
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functions  on  (X,/3,p)): 


~lL'9{ft(z))=  f  gdn  (2.3) 

n~*°°  n  i= o  J 

for  alnaost  all  x.  This  relation  suggests  that  ergodicity  and  topological  transitivity  are 
related,  and  indeed  this  is  the  case.  As  shown  in  [34],  if  /  is  ergodic  on  the  measure  space 
(X,0,p)  and  there  exists  a  topology  on  X  with  countable  basis  such  that  every  open  set 
has  nonzero  measure,  then  /  is  topologically  transitive  with  respect  to  this  topology.  Thus, 
ergodicity  implies  topological  transitivity  under  these  conditions.  However,  an  ergodic 
system  need  not  have  sensitive  dependence  on  initial  conditions,  and  thus  not  all  ergodic 
systems  are  chaotic. 

For  the  problems  considered  in  this  thesis,  the  map  /  :  M  — ►  M  is  given  and  we  seek 
a  cr-algebra  /3  on  M  and  a  probability  measure  p  on  0  such  that  /  is  measure-preserving 
and  more  importantly  ergodic.  The  measure  of  interest  in  Chapters  4  and  5  is  the  unique, 
ergodic,  so-called  physical  measure  [21]  which  is  defined  in  the  following,  limiting  sense. 
Consider  the  noise-driven  system 

x(n  +  1)  =  f(x(n))  +  ew(n).  (2.4) 

Under  certain  conditions  this  system  has  a  unique  invariant  measure  pt,  when  {tu(t’)}  is  a 
white-noise  sequence  and  c  is  a  small  positive  constant.  The  physical  measure  is  given  by 
lime-,^  . 

The  value  of  an  ergodic  measure  for  f  arises  from  the  fact  that  if  such  a  measure  p  exists, 
then  by  the  Multiplicative  Ergodic  Theorem  of  Oseledec  [21],  the  following  limit  exists  for 
x,  /i-almost  everywhere  (i.e.,  for  all  x  except  on  a  set  of  //-measure  zero): 

A.  =  Jim  {D{n*)T}  D{f  (*)}}fe  .  (2.5) 

and  the  eigenvalues  of  Ax  are  /^-almost  everywhere  constant.  The  logarithms  of  these 
eigenvalues  are  known  as  the  Lyapunov  exponents  of  f  with  respect  to  the  measure  p.  In 
addition,  for  //-almost  all  *  €  if  Ai  >  A2  >  •  •  •  >  Xj^r  denote  the  ordered  Lyapunov 
exponents  not  repeated  by  multiplicity,  and  Ex  denotes  the  subspace  of  associated  with 
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all  eigenvalues  with  logarithms  that  are  less  than  or  equal  to  Aj,  then  the  following  holds: 

^log||Z?{/n(*»u||  =  A,-  (2.6) 

n— kx>  ji 

for  each  unit  vector  u  e  Elx  \  ££+1,  where  ||  •  ||  denotes  the  Euclidean  norm  and  Els  \  ££+1 
denotes  the  set  formed  by  taking  ££  and  removing  the  subspace  ££+1. 

For  a  linear,  time-invariant,  deterministic  system  /  with  no  zero-valued  eigenvalues  for 
the  matrix  representation  of  /,  the  Lyapunov  exponents  are  the  logarithms  of  the  magni¬ 
tudes  of  the  eigenvalues.  For  these  systems,  the  multiplicative  ergodic  theorem  implies  that 
the  1/(2 n)  roots  of  the  singular  values  of  fn  converge  to  the  magnitudes  of  the  eigenvalues 
of  f,  and  the  subspace  spanned  by  the  singular  vectors  corresponding  to  the  m  smallest 
singular  values  of  fn  converges  to  the  subspace  spanned  by  the  m  smallest  eigenvalues  of 
/,  for  m  =  1,  •  •  -,  Af. 

The  relation  given  by  (2.6)  suggests  that  if  u  €  ££\££+1  is  a  unit  vector  and  y  =  x  +  6u 
where  6  is  a  small  positive  constant,  then 

||/n(*)  ~  fn(v) II  *  ll£,{/n(*)}  Ml  »  s  exp(nAi).  (2.7) 

If  the  approximation  were  accurate,  an  implication  would  be  that  /  has  sensitive  depen¬ 
dence  on  initial  conditions  if  the  largest  Lyapunov  exponent  A*  is  positive.  However,  this 
approximation,  frequently  cited  in  the  literature,  is  often  a  poor  approximation  unless  the 
magnitude  of  6  is  infinitesimally  small,  and  it  is  thus  a  poor  approximation  for  practical 
purposes.  As  such,  it  is  unclear  if  the  existence  of  a  positive  Lyapunov  exponent  implies 
that  there  is  sensitive  dependence  on  initial  conditions,  although  at  least  experimentally 
this  appears  to  be  the  case.  In  addition,  since  Lyapunov  exponents  are  defined  only  for  er¬ 
godic  systems,  a  system  with  a  positive  Lyapunov  exponent  is  topologically  transitive  (with 
respect  to  a  reasonable  topology,  e.g.,  one  for  which  each  nonempty  open  set  has  nonzero 
measure). 

With  most,  if  not  all,  definitions  of  chaos  which  include  ergodic  considerations,  a  funda¬ 
mental  criterion  for  a  system  to  be  considered  chaotic  is  that  there  be  a  a  positive  Lyapunov 
exponent  defined  on  a  nontrivial  measure  space.  The  dissipative  systems  considered  in 
Chapter  4  and  5  satisfy  this  condition  as  do  a  subset  of  the  systems  considered  in  Chapters 
6  and  7.  Implicitly  or  explicitly  inherent  in  all  these  definitions  are  topological  considera- 
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tions  as  well,  since  the  definition  of  Lyapunov  exponents  assumes  the  existence  of  a  metric 
and  thus  a  metric  topology  on  the  underlying  measure  space.  As  such,  whereas  one  can 
define  chaos  solely  in  terms  of  topological  properties  as  is  done  in  [20],  implicit  or  explicit 
in  any  definition  of  chaos  involving  ergodic  considerations  are  topological  considerations  as 
well. 

2.3.3  Dissipative  Chaos 

The  next  two  chapters  deal  with  dissipative  diffeomorphisms  that  are  either  chaotic  or 
suspected  of  being  chaotic.  These  are  perhaps  the  most  interesting  and  representative  of 
real-world  phenomena,  yet  least  understood  of  all  chaotic  systems.  In  particular,  the  focus 
is  on  maps  or  time-sampled  flows,  /  :  S  —*■  S,  where  /  is  invertible  with  both  /  and 
/ -1  having  continuous  derivatives  and  where  5,  the  state  space  or  phase  space ,  is  a  simply 
connected  (open)  subset  of  1Z ^  for  some  A f.  A  dissipative  map  is  nonrigorously  defined  as 
one  which  contracts  volumes  in  state  space,  in  contrast  to  a  conservative  system  which  is 
one  that  preserves  state-space  volumes  [21]. 

We  briefly  discuss  dissipative,  chaotic  maps  and  their  properties  in  this  section.  Whereas 
these  maps  and  their  properties  remain  poorly  understand  and  a  rigorous  discussion  of  these 
properties  requires  use  of  mathematical  concepts  beyond  the  scope  of  this  thesis,  only  a 
brief,  nonrigorous  discussion  is  provided  here  with  the  reader  cautioned  that  some  of  the 
information,  much  of  which  was  extracted  from  [21,  33],  remains  the  subject  of  debate 
among  theoreticians. 

One  interesting  property  of  dissipative,  chaotic  systems  is  that  although  they  are  de¬ 
terministic  and  contract  volumes,  their  dynamics  are  nontrivial.  In  particular,  they  often 
give  rise  to  complicated  attractors  known  as  strange  attractors.  Intuitively,  an  attractor 
is  a  bounded  region  of  phase  space,  where  the  points  on  the  orbits  generated  by  initial 
conditions  in  some  attracting  set  accumulate  as  n  grows  large,  where  an  attracting  set  can 
be  formally  defined  as  follows: 

Attracting  Set:  An  attracting  set  A  with  fundamental  neighborhood  U  is  a  compact  set 
(in  phase  space)  which  is  invariant,  i.e.,  f(A)  =  A  and  for  which  every  open  set  V  containing 
A  satisfies  fn(U )  C  V  for  n  large  enough. 

In  other  words,  an  attracting  set  A  is  one  for  which  orbits  generated  by  points  inside  A 
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remain  in  A  for  all  time,  and  for  which  orbits  generated  by  points  in  a  certain  open  set 
containing  A  either  eventually  enter  and  remain  in  A  or  converge  to  A. 

A  slightly  more  rigorous  definition  of  an  attractor  is  an  attracting  set  which  contains 
a  dense  orbit,  thereby  implying  that  the  system  f  which  gives  rise  to  the  attractor  is 
topologically  transitive  when  restricted  to  this  attracting  set  [33,  p.  36].  A  strange  attractor 
can  be  nonrigorously  defined  as  an  attractor  for  which  there  is  sensitive  dependence  on  initial 
conditions,  with  a  more  rigorous  definition  being  an  attractor  which  contains  a  transversal 
homoclinic  orbit  [33].  An  explanation  of  a  transversal  homoclinic  orbit  is  beyond  the  scope  of 
this  thesis;  its  relevance  to  this  discussion  is  that  its  existence  leads  to  orbits  with  nontrivial 
behavior,  with  typical  orbits  often  giving  rise  to  complicated,  fractal  patterns. 

The  relevance  of  attractors  and  strange  attractors  to  this  thesis  is  threefold.  First,  given 
an  attractor,  one  can  find  an  open  set  in  for  which  the  orbit  of  almost  every  point  in  this 
set  converges  to  the  attractor.  Second,  if  there  is  sensitive  dependence  on  initial  conditions 
(or  a  transversal  homoclinic  orbit),  the  dynamics  of  typical  orbits  generated  by  points  on 
or  near  the  attractor  are  nontrivial.  Third,  the  steady-state  behavior  of  orbits  generated 
by  different  points  on  or  near  the  attractor  is  similar. 

Often  a  strange  attractor  has  zero  volume  in  the  original  state  space  and  so-called  fractal 
dimension  [21].  A  discussion  of  dimensions  of  strange  attractors  is  beyond  the  scope  of  this 
thesis.  However,  as  shown  in  Chapter  4,  because  of  the  similar  steady-state  behavior  of 
orbits  generated  by  points  on  the  attractor  and  because  these  orbits  occupy  a  small  region 
of  state  space,  one  can  derive  simple,  potentially  effective  state  estimation  algorithms  for 
these  systems,  which  do  not  require  full  knowledge  of  the  underlying  system  dynamics. 

As  noted  earlier,  a  common  ergodic  criterion  for  chaos  is  that  there  be  a  positive  Lya¬ 
punov  exponent.  However,  dissipativeness  of  a  system  f  generally  requires  that  the  Jacobian 
of  /  or  fn  for  some  integer  n  have  absolute  value  less  than  one.  As  a  (nonobvious)  con¬ 
sequence  of  the  multiplicative  ergodic  theorem  and  these  constraints,  a  dissipative,  chaotic 
system  must  have  at  least  one  negative  Lyapunov  exponent,  and  the  sum  of  the  Lyapunov 
exponents  must  be  negative.  Therefore,  the  state  vector  dimension  must  be  at  least  two 
for  a  dissipative,  chaotic  map.  Similarly,  one  can  show  that  a  chaotic  flow  always  has  a 
zero- valued  Lyapunov  exponent  which  corresponds  to  motion  in  the  flow  direction  [21]. 
Therefore,  for  a  chaotic  flow  or  a  map  arising  from  time-sampling  the  flow,  the  state  vector 
dimension  must  be  at  least  three.  These  conditions  on  the  dimensions  of  the  state  vectors 
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for  dissipative,  chaotic  maps  and  flows  makes  analysis  of  and  algorithm  development  (e.g., 
state  estimation  algorithms)  for  these  systems  difficult,  since  one  cannot  simply  develop 
algorithms  for  one-dimensional  systems  and  then  adapt  the  results  to  higher  dimensions. 

One  practical  problem  that  arises  in  computer  simulation  of  an  invertible,  dissipative, 
chaotic  map  is  that  the  inverse  system  is  generally  unstable  and  the  orbits  generated  by 
most  points  rapidly  tend  to  infinity.  This  follows  from  the  fact  that  since  the  system  is 
dissipative  and  thus  contracts  volumes,  the  inverse  system  expands  volumes.  Because  of 
this,  it  is  difficult  to  obtain  accurate  backward  orbit  segments  for  points  even  for  those  points 
near  the  attractor,  where  a  backward  orbit  segment  for  a  point  is  an  orbit  segment  for  which 
the  point  is  the  final  condition. 


2.4  Examples  of  Dissipative,  Chaotic  Maps  and  Flows 

A  number  of  dissipative  maps  and  flows,  which  either  satisfy  or  are  believed  to  satisfy 
fundamental  topological  and  ergodic  properties  associated  with  chaos,  have  been  discovered 
and  reported  over  the  last  three  decades,  perhaps  the  most  noteworthy  having  been  Lorenz’s 
seminal  discovery  of  the  chaotic  flow  that  bears  his  name.  In  this  section  we  discuss  the  three 
dissipative  systems,  two  maps  and  one  flow,  used  for  the  experimental  results  presented  in 
Chapters  4  and  5.  The  three  systems  are  representative  of  dissipative,  chaotic  systems,  in 
general,  and  are  perhaps  the  systems  most  frequently  used  in  the  study  of  dissipative  chaos. 

The  two  dissipative  maps  used  in  this  thesis  are  the  Henon  and  Ikeda  maps.  As  with  most 
dissipative  systems  suspected  of  being  chaotic,  the  properties  of  these  maps  are  only  partially 
understood.  Both  maps  are  dissipative  diffeomorphisms  with  state  vector  dimension  of  two, 
the  minimum  dimension  for  a  dissipative,  chaotic  map.  The  state  or  system  equations 
x(n  +  1)  =  f(x(n ))  for  the  two  maps,  expressed  componentwise  are  the  following: 

Henon  Map 

£i(n  +  l)  =  1  —  lAx\{n)  +  X2{n) 

x2(n+l)  =  -3ri(n) 

Ikeda  Map 

Xi(n+1)  =  1  +  .9  [zi(n)  cos  a(n)  —  x2(ra)  sin  a(n)]  (2.10) 


(2.8) 

(2.9) 
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(2.11) 

(2.12) 


£2 (n  +1)  =  -9  [£i(n)  sin  a(n )  +  £2 (n)  cos  o(n)] 

a(n)  —  .4  —  - 57— - 77 — -. 

1  +  x\{n)  +  x\(n) 

where  x(n )  =  [xi(n),X2(n)]T .  Other  choices  for  the  constants  in  the  above  equations  have 
also  been  used;  but,  the  properties  of  the  resulting  maps  differ  at  least  slightly  from  those  of 
the  above  maps.  If  the  state  vector  x{n)  is  treated  as  a  scalar,  complex  quantity  with  real 
and  imaginary  parts  xT(n)  and  a,-(n)  respectively,  so  that  x(n)  =  xr(n)  +  j  a,-(n)  (where 
j 2  =  -1),  then  the  Ikeda  map  can  be  succinctly  expressed  as  follows: 


x(n  +  1)  =  1  +  .9  x 


6 

1  +  ||a:(n)||2. 


1 


(2.13) 


where  ||sc(n)||  =  £^(n)  +  xf (n). 

Figures  2-1  and  2-2  depict  a  typical  orbit  segment  for  each  of  the  two  maps,  with  a 
point  near  the  attractor  used  as  the  initial  condition  for  each  segment.  Note  that  the  time 
ordering  of  the  orbit  points  in  each  orbit  segment  is  not  discernible  from  the  figures.  The 


Figure  2-1:  Orbit  Segment  for  the  Henon  Map 

orbit  segment  for  each  map  traces  out  the  complicated  attractor  associated  with  that  map. 
Because  of  the  ergodic  nature  of  the  maps,  orbit  segments  generated  by  most  other  initial 
conditions  near  the  attractor  trace  out  the  same  patterns. 

The  chaotic  flow  used  in  this  thesis  is  the  Lorenz  flow,  perhaps  the  most  widely  inves¬ 
tigated  of  all  chaotic  systems.  The  state  dimension  for  this  flow  is  three,  the  minimum 
dimension  for  a  dissipative,  chaotic  flow.  The  state  or  system  equations  =  F(x(t)) 
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Figure  2-2:  Orbit  Segment  for  the  Ikeda  Map 

for  the  flow,  expressed  componentwise,  are  given  by 

10[x2(*)  -  *i(0]  (2.14) 

28xi(t)  —  x2(t)  -  xi(t)x3(t)  (2.15) 

-|*3(*)  +  *iW  ®2(*)-  (2.16) 

where  x(t )  =  [xi(f),  x2(t),  23(f)]7'.  Figure  2-3  depicts  a  typical  trajectory  projected  onto 
the  (#1,  £3)  plane  for  the  Lorenz  flow.  The  trajectory  shown  in  the  figure  and  the  Lorenz 


dxi(t) 

dt 

dX2(t) 

dt 

dx3(t ) 
dt 


Figure  2-3:  Projection  of  Lorenz  Trajectory  onto  (®i,£3)  Plane 


trajectories  used  throughout  the  thesis  were  obtained  by  numerically  integrating  the  state 

27 


equation  using  the  fourth-order  Runge  Kutta  method  with  a  step  size  of  .005. 
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Chapter  3 


State  Estimation  Fundamentals 


3.1  Introduction 

In  this  chapter  we  establish  a  foundation  for  the  estimation  problems,  algorithms,  and  per¬ 
formance  bounds  presented  in  Chapters  4  and  5.  The  chapter  begins  by  introducing  the 
state  estimation  scenario  of  interest  in  this  thesis  and  then  briefly  reviews  the  two  probabilis¬ 
tic  estimation  techniques — Maximum-Likelihood  (ML)  and  Minimum-Mean-Squared-Error 
(MMSE) — which  underlie  the  state  estimation  algorithms  discussed  in  Chapter  4.  The  chap¬ 
ter  continues  by  briefly  discussing  the  Kalman  filter,  the  optimal,  MMSE  state  estimator 
for  linear,  state  estimation  problems.  Next,  the  chapter  provides  a  historical  summary  of 
nonlinear,  state  estimation  research  in  general  and  concludes  with  a  more  focused  summary 
of  state  estimation  research  involving  chaotic  systems. 

3.2  State  Estimation  Scenario 

Of  the  three  general,  nonlinear,  state  estimation  scenarios  traditionally  considered  in  the 
estimation  and  control  literature,  two  are  relevant  to  this  thesis.  The  first,  referred  to  as  the 
DTS/DTO  scenario  in  the  thesis,  consists  of  discrete- time  state  and  observation  equations 
with  commonly  used  but  not  the  most  general  forms  of  the  equations  given  by  the  following: 

Discrete-Time  System,  Discrete-Time  Observation  (DTS/DTO) 

*(n  +  l)  =  /„(*(»)) +  ffB(*(n))tr(n) 
y{n)  =  feB(*(»))  +  *(n). 
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(3.1) 

(3.2) 


In  the  first  equation,  the  state  equation ,  x(n)  is  the  jV- dimensional  state  vector  we  seek  to 
estimate;  fn  and  gn  are  nonlinear,  possibly  time-varying  functions  of  the  state  which  are 
usually  required  to  satisfy  certain  smoothness  constraints;  and  w(n ),  the  driving  noise ,  is 
an  jV-dimensional,  zero-mean,  Gaussian,  white-noise  process.  In  the  second  equation,  the 
observation  equation ,  y{ri)  is  the  ^-dimensional  observation  vector  used  for  estimating  x{n)\ 
hn  is  a  nonlinear,  possibly  time-varying  function  of  the  state  which  is  usually  required  to 
satisfy  certain  smoothness  constraints;  and  v (n),  the  observation  noise,  is  a  ^-dimensional, 
zero-mean,  Gaussian,  white-noise  process.  Generally,  w(n )  and  v(n)  are  assumed  to  be 
uncorrelated  with  each  other  and  with  the  initial  condition  a:(0). 

The  second  scenario,  referred  to  as  the  CTS/DTO  scenario  in  the  thesis,  consists  of  a 
continuous-time  state  equation  and  a  discrete-time  observation  equation  with  commonly 
used,  but  not  the  most  general,  forms  of  the  equations  given  by  the  following: 

Continuous-Time  System,  Discrete-Time  Observation  (CTS/DTO) 


dx(t)  = 

Ft(x{t))dt+  Gt{x(t))dW(t) 

(3.3) 

y(n)  = 

hn(x(nT ))  +  u(n). 

(3.4) 

For  this  scenario,  the  state  equation  is  a  stochastic  differential  equation  [42,  54]  in  which  x(t) 
is  the  A1" -dimensional  state  vector  we  seek  to  estimate;  Ft  and  Gt  are  nonlinear,  possibly 
time- varying  functions  of  the  state  which  are  required  to  satisfy  a  set  of  both  smoothness  and 
growth-rate  constraints;  and  W(t )  is  an  Af- dimensional  standard  Brownian  motion.  The 
observation  equation  for  this  scenario  has  the  same  interpretation  as  for  the  DTS/DTO 
scenario. 

Whereas  the  focus  of  this  thesis  is  on  chaos,  only  restricted  forms  of  the  DTS/DTO 
and  CTS/DTO  scenarios  are  of  interest.  In  particular,  we  require  the  functions  Ft,  gn, 
Gt  and  hn  in  (3. 1-3.4)  to  be  time-invariant  and  thus  expressible  as  /,  F,  g,  G  and  h.  In 
addition,  we  require  f  in  (3.1)  to  be  a  chaotic  map  and  F  in  (3.3)  to  be  a  chaotic  flow; 
and  in  Chapters  4  and  5  we  further  require  that  /  and  F  be  dissipative  diffeomorphisms. 
Finally,  whereas  our  interest  is  in  deterministic  systems  and  the  properties  exhibited  by 
a  class  of  these  systems,  we  consider  the  restricted  form  of  the  state  equations  in  which 
driving  noise  is  absent.  With  these  restrictions,  the  equations  for  the  DTS/DTO  model 
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reduce  to  the  following: 


x(n  +  1) 

=  /(*(«)) 

(3.5) 

»(») 

=  h(x(n))  +  v(n ) 

(3.6) 

and  the  equations  for  the  CTS/DTO  scenario  reduce  to  the  following: 

dx(t ) 
dt 

F(x(t)) 

(3.7) 

y(n)  = 

h{x(nT ))  +  v(n) 

(3.8) 

where  /  is  a  dissipative,  chaotic  diffeomorphism  and  F  is  a  dissipative,  chaotic  flow. 

The  omission  of  driving  noise  in  (3.5)  and  (3.7)  renders  these  state  equations  fundamen¬ 
tally  different  from  the  more  general  equations  (3.1)  and  (3.3)  respectively,  from  which  they 
came.  In  particular,  the  stochastic  nature  of  the  processes  x(n)  and  x(t)  which  (3.1)  and 
(3.3)  respectively  give  rise  to  is  due  both  to  uncertainty  in  the  initial  condition  x(0)  and  to 
the  driving  noise  terms  w(n)  and  W{t).  In  contrast,  the  stochastic  nature  of  the  processes 
which  (3.5)  and  (3.7)  give  rise  to  is  due  solely  to  uncertainty  in  the  initial  condition.  That 
is,  if  the  initial  condition  is  known  with  certainty,  the  state  at  all  future  times,  and  at 
all  past  times  if  the  system  is  invertible,  is  known  with  certainty  as  well  regardless  of  the 
observation  noise  v(n).  Consequently,  the  deterministic  problem  considered  in  this  thesis  is 
a  simpler  problem  than  the  one  involving  a  noise-driven  state  equation  and  facilitates  the 
derivation  of  potentially  effective,  albeit  heuristic,  state  estimation  algorithms.  In  addition, 
the  derivation  and  interpretation  of  performance  bounds  for  the  deterministic  problem,  as 
is  done  in  Chapter  5,  is  a  far  simpler  task  than  the  derivation  and  interpretation  of  bounds 
for  the  problem  involving  a  noise-driven  state  equation. 

Nonetheless,  state  estimation  involving  a  deterministic,  chaotic  system  has  many  sim¬ 
ilarities  to  state  estimation  involving  a  noise-driven  system,  as  one  deals  with  nontrivial, 
invariant  measures  and  positive  entropy  rates  for  both  systems.  In  addition,  because  chaotic 
systems  exhibit  sensitive  dependence  on  initial  conditions  and  because  round-off  error  is  in¬ 
evitable  in  computer  simulations  involving  chaos,  a  state  equation  with  a  small  driving 
noise  term  is  sometimes  a  more  representative  model  of  the  underlying  system  dynamics 
when  computer-generated  chaos  is  being  dealt  with.  Although  we  do  not  adopt  such  a 
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model  in  this  thesis,  in  the  next  chapter  we  address  the  problems  that  arise  from  computer 
round-off  error  when  designing  practical,  state  estimation  algorithms  and  offer  simple,  albeit 
suboptimal  remedies  for  these  problems. 

As  suggested  in  Chapter  2,  we  can  express  the  restricted  CTS/DTO  scenario  given 
by  (3.7)  and  (3.8)  as  a  DTS/DTO  scenario.  In  particular,  since  it  is  deterministic,  the 
time-sampled  state  equation  (3.7)  is  given  by  the  following  discrete-time  state  equation: 

z(n  +  l)  =  fT{z(n))  (3.9) 

where  z(n )  =  x(nT)  (3.10) 

f(n+l)T 

/r(z(n))  =  z(n)+  /  F(x(t))dt.  (3.11) 

JnT 

However,  as  mentioned  in  Chapter  2,  chaotic  maps  arising  from  time-sampled  chaotic  flows 
have  certain  properties  not  shared  by  all  chaotic  maps.  In  particular,  since  a  differentiable 
flow  is  always  invertible,  the  same  holds  for  any  map  that  arises  by  time-sampling  the  flow. 
Also,  as  discussed  earlier,  there  is  a  minimum  state  vector  dimension  of  three  for  a  chaotic 

flow,  and  the  flow  must  have  at  least  one  zero- valued  Lyapunov  exponent.  The  same  applies 

to  maps  that  arise  by  time-sampling  the  flow. 

3.3  Maximum-Likelihood  (ML)  and  Minimum-Mean-Squared- 
Error  (MMSE)  State  Estimation 

The  focus  in  this  thesis  is  on  estimating  the  state  x(no)  at  either  a  fixed  time  no  or  a  sequen¬ 
tial  set  of  times  for  the  restricted  DTS/DTO  and  CTS/DTO  problem  scenarios  introduced 
in  the  previous  section,  using  a  given  set  of  observations  Y  =  {y(i)}  which  generally  will 
be  sequential  in  time  and  thus  expressible  as  Y(M,N)  =  where  M  and  N  are 

integers  with  M  <  N .  Both  filtering  and  smoothing  are  considered,  where  filtering  in¬ 
volves  estimating  ®(no),  the  state  at  time  no,  using  observations  y{i)  only  for  times  i  <  no, 
whereas  smoothing  involves  estimating  *(no)  using  observations  for  times  i  >  no  as  well. 

Two  probabilistic,  state  estimation  approaches  that  have  proven  useful  in  many  ap¬ 
plications  [52,  72,  86]  are  emphasized — Maximum-Likelihood  (ML)  and  Minimum-Mean- 
Squared-Error  (MMSE).  Recall  that  with  ML  parameter  estimation,  the  unknown  param¬ 
eter  one  seeks  to  estimate  is  treated  as  a  nonrandom  quantity  and  the  ML  estimate  is 


32 


that  value  of  the  parameter  which  maximizes  an  appropriately  defined  likelihood  function 
or  equivalently  the  logarithm  of  the  likelihood  function.  For  the  problem  of  interest  here, 
the  unknown,  nonrandom  parameter  is  x(uq),  a  set  of  observations  Y  is  given,  and  the  ML 
estimate  of  x(no),  hereafter  denoted  xml(uq),  is  that  value  of  x(uq)  which  maximizes  the 
likelihood  function  p{Y ;  x(no)),  where  p(Y ;  x(no))  denotes  the  probability  density  function 
(PDF)  of  the  observation  set  Y  for  a  given  x(no),  with  an  underlying  assumption  being 
that  the  PDF  exists.  For  the  restricted  DTS/DTO  scenario,  it  follows  from  (3.6)  and  the 
assumptions  on  the  observation  noise  sequence  {u(n)}  that  logp(y(i)\  x(i))  is  given  by 

log  *>(»(*);*(*))  =  log(2  7r|J2|)zr 

[»(•')  -  R-'  [»(.•)  -  fe(x(i))]  (3.12) 

where  V  is  the  dimension  of  the  observation  vector  and  R  is  the  covariance  matrix  of  v(n). 
In  light  of  the  determinism  of  the  state  equation  (3.5)  and  the  assumed  invertibility  of  f, 
we  also  have 

logp(»(*);*(«o))  =  log(2*r|B|)“^ 

[»(0  -  M/*-n° (®(no)))] T  12“ 1  [y(i)  -  fe(/l"n°(*(«o)))]  •  (3.13) 

Using  this  equality  and  exploiting  the  whiteness  of  the  observation  noise  leads  to  the  fol¬ 
lowing  expression  for  the  log-likelihood  function  \ogp(Y(M,N);x(no))  for  the  restricted 
DTS/DTO  scenario: 

\ogp(Y(M,N);x(n0))  =  log(2  7r|JK|)  (N~?+1)V 

[y (0  -  Hf~no (»(no)))] T  R'1  [y(0  -  M/,_no (*(»o)))]  •  (3-14) 

i—M 

The  log-likelihood  function  for  the  CTS/DTO  scenario  has  a  similar  form. 

Also  recall  that  with  MMSE  estimation,  in  contrast  to  ML  estimation,  the  unknown 
parameter  one  seeks  to  estimate  is  treated  as  a  random  quantity.  For  the  problem  of 
interest  here  with  ®(no)  the  unknown  parameter  to  be  estimated  and  Y  a  given  observation 
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set,  the  mean-squared  estimation  error  is  given  by 


J  J  \\x(n0)  -  x(n0)\\2  p(x(n0),Y)  dx(n0)  dY  (3.15) 

where  ||  •  ||  denotes  the  Euclidean  norm,  x(uq)  denotes  an  arbitrary  estimator  for  x(ng), 
p(x(no),Y)  denotes  the  joint  PDF  of  the  state  vector  x(no)  and  observation  set  Y,  and 
where  the  integration  is  over  the  state  vector  x(n o)  and  the  entire  observation  set  Y .  A 
fundamental  result  in  estimation  theory  is  that  the  MMSE  estimator  results  by  choosing 
the  conditional  mean  E(a;(no)|y)  as  the  estimate  of  x(no)  for  each  observation  set  Y ,  where 
E(x(no)\Y)  is  given  by 


E(x(n0)\Y)  =  J  x(n0)p(a:(rao)|E)<fx(no), 


(3.16) 


and  where  p(x(no)\Y),  the  a  posteriori  state  density,  is  the  PDF  of  x(no)  conditioned  on 
the  observation  set  Y.  Use  of  Bayes  rule  allows  E(x(tiq)\Y)  also  to  be  expressed  as 


E(x(n  liyi  =  /z(no)P(yla:(no))K^(^o))^(^o) 

°  /K^|®(fto))K*(no))<Mno) 


(3.17) 


where  p(x(no))  denotes  the  unconditional  or  a  priori  PDF  of  x(no)  and  p(Y |x(«o))  denotes 
the  PDF  of  Y  conditioned  on  x(no)-  Note  that  p(Yjx(no))  has  the  same  form  as  the  PDF 
p(Y;x(no))  defined  earlier,  with  the  difference  between  the  two  PDFs  being  that  x(no)  is 
a  random  vector  in  the  former  and  a  nonrandom  vector  in  the  latter. 

An  inherent  assumption  in  (3.15)  is  the  existence  of  the  joint  PDF  p(x(no),P)  with 
respect  to  the  product  measure  dx(no)  dY ;  an  inherent  assumption  in  (3.16)  is  the  existence 
of  the  conditional  PDF  p(x(no)\Y)  with  respect  to  the  measure  dx(no)  (i.e.,  Lebesgue 
measure  on  TZ^  where  A f  is  the  dimension  of  x(n0));  and  an  inherent  assumption  in  (3.17) 
is  the  existence  of  the  PDF  p(x(no))  with  respect  to  the  measure  dx(no)  and  the  conditional 
PDF  p(F|x(no))  with  respect  to  the  measure  dY  (i.e.,  Lebesgue  measure  on  pi(E-M+i)r 
where  N  —  M  + 1  is  the  number  of  observations  and  V  is  the  dimension  of  each  observation 
vector).  For  dissipative,  chaotic  systems,  these  assumptions  are  not  necessarily  valid.  For 
example,  if  the  only  a  priori  knowledge  about  x{tiq)  is  that  it  lies  on  the  attractor,  then  an 
appropriate  a  priori  distribution  for  x(n0 )  is  given  by  the  physical  measure  on  the  attractor. 
Intuitively,  this  a  priori  distribution  corresponds  to  x(rao)  having  the  same  likelihood  of 
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being  at  each  point  on  an  (infinitely  long)  orbit  on  the  attractor.  However,  as  mentioned 
earlier,  for  dissipative,  chaotic  systems  the  physical  measure  is  often  singular  with  respect 
to  Lebesgue  measure  on  and  thus  has  no  corresponding  PDF  with  respect  to  Lebesgue 
measure.  Consequently,  the  PDFs  p(aj(n0)),  p(aj(n0),y),  and  p(®(n0)|y)  are  generally  not 
defined  for  dissipative  systems  if  the  a  priori  distribution  of  x(n0)  is  given  by  the  physical 
measure  on  the  attractor.  In  contrast,  the  likelihood  function  p(y|a:(no))  is  well-defined  in 
these  situations  because  of  our  assumptions  on  the  observation  noise. 

When  the  joint  density  p(x(no),  Y)  is  not  defined,  we  can  express  the  MSE  with  respect 
to  the  joint  probability  measure  on  x(n0 )  and  Y,  or  alternatively  as  follows: 


J  J  ||*(**o)  -  v(no)\\2p{Y\x(n0))dpx(no)dY 


(3.18) 


where  px(n0)  denotes  the  a  priori  distribution  of  x(nQ),  and  the  integration  over  a:(no)  is 
defined  as  a  Lebesgue  integral.  Similarly,  we  can  express  the  conditional  mean  as 


£(*(n0)|F) 


/  a(n0)p(y|a:(n0))  dpx(no) 

Jp(Y\x(n0))dps(no) 


(3.19) 


As  we  show  in  Chapter  4,  although  the  above  definition  for  the  conditional  mean  is  abstract 
and  not  computable  in  practice,  it  can  be  cast  in  a  revealing  form  that  gives  rise  to  a 
converging  sequence  of  simple,  approximate  MMSE  state  estimators. 


3.4  Linear  State  Estimation  and  the  Kalman  Filter 

The  Kalman  filter  is  a  computationally  efficient,  recursive  MMSE  state  estimator  for  both 
continuous-time  and  discrete-time,  linear,  state-space  models  with  certain  restrictions  on 
the  driving  noise,  observation  noise,  and  distribution  of  the  initial  state  [5,  30,  38,  57].  Of 
relevance  to  this  thesis  is  the  form  of  the  Kalman  filter  applicable  to  the  following,  linear, 
DTS/DTO  scenario: 


=  Fnx(n )  +  Gnw(n ) 
=  Hnx(n)  +  r(n), 


x(n  +  1) 
y(n) 
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(3.20) 

(3.21) 


where  Fn,  Gn,  and  Hn  are  matrices,  w(n )  ~  $(0,  Qn),  v(n)  ~  $(0,  Rn),  *(0)  ~  $(rno,  jPo) 
(where  $(m,  P )  denotes  the  normal  distribution  with  mean  vector  m  and  covariance  ma¬ 
trix  P),  and  where  the  driving  noise  w{n)  and  observation  noise  v{n)  are  independent  of 
each  other  and  the  initial  state. 

The  Kalman  filter  exploits  the  fact  that  for  linear  state  estimation,  the  a  posteri¬ 
ori  density  p(x(n)\Y(0,n))  is  Gaussian  and  that  the  mean  of  the  density  is  the  MMSE 
state  estimate.  The  Kalman  filter  uses  a  two-step  procedure  for  recursively  computing  two 
quantities — the  state  estimate  x(n)  and  the  error  covariance  matrix  P(n)  given  by 


P(n)  =  E  (x(n)  —  ®(n))(x(n)  —  *(n))T)|y(0,  n)] 


(3.22) 


In  the  first  step,  the  prediction  step ,  the  state  estimate  and  covariance  matrix  for  time 
n  - 1-1  are  computed  from  the  final  state  estimate  and  covariance  matrix  for  time  n.  In 
the  second  step,  the  measurement  for  observation)  update  step,  the  quantities  calculated  in 
the  first  step  are  updated  using  the  new  observation  y(n  -f  1).  As  is  the  usual  convention, 
we  let  x(n  +  l|n)  and  P(n  +  l\n)  denote  the  state  estimate  and  error  covariance  matrix, 
respectively,  computed  in  the  prediction  step,  with  the  notation  chosen  to  emphasize  the  fact 
that  these  quantities  are  for  time  n  +  1  based  on  observations  through  time  n.  Similarly, 
we  let  x(n  +  1| n  +  1)  and  P(n  +  l|n  +  1)  denote  the  updated  estimates  calculated  in 
the  measurement  update  step,  with  the  notation  chosen  to  emphasize  the  fact  that  these 
quantities  are  for  time  n  +  1  based  on  observations  through  time  n  +  1.  These  definitions 
are  used  in  Table  3.1,  which  provides  the  equations  for  the  two-step  estimation  procedure 
constituting  the  DTS/DTO  Kalman  filter. 

For  many  applications,  there  is  an  improvement  in  state  estimation  performance  over 
filtering  if  smoothing  is  used.  As  shown  in  the  next  chapter,  this  is  especially  true  with 
chaotic  systems.  Historically,  research  has  focused  on  three  classes  of  smoothing  problems. 
The  first,  fixed-point  smoothing,  involves  estimating  the  state  vector  x(n)  based  on  the 
observation  set  Y (0,  m)  =  {y(i)}^.0  f°r  a  fixed  time  n  and  increasing  to,  where  to  >  n. 
The  second,  fixed-lag  smoothing,  involves  estimating  the  state  vector  x(n  —  L )  based  on  the 
observation  set  F(0,  n)  =  {t/(f)}”=0  for  each  time  n  and  a  fixed  lag  L.  The  third,  fixed- 
interval  smoothing,  involves  estimating  the  state  vector  x (n)  based  on  the  observation  set 
Y(0,  N )  =  {y(i)}£Lo  f°r  all  times  n  satisfying  0  <  n  <  N. 
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Prediction  Step 

i(n  +  l|n)  =  Fnx(n\n) 

(3.23) 

P(n  +  l|n)  =  FnP{n\n)FTn+GnQnGTn 

(3.24) 

Measurement  Update  Step 

x(n  +  l|n  +  1)  =  x(n  +  l|n)  +  K(n  +  l)[y(n  +  1)  -  H„+ix(n  +  l|n)] 

(3.25) 

K(n  +  1)  =  P(n  +  l|n)  H^+1 

Hn+i  P(n  +  l|n)  H%+1  +  R{n  +  1) 

_1 (3.26) 

P(n  +  l|n  +  1)  =  [IM-K(n+l)Hn+1]P(n  +  l\n) 

(3.27) 

Initialization 

i(0|  -  1)  =  m0 

(3.28) 

O 

1 

II 

o 

(3.29) 

Table  3.1:  The  Kalman  Filter  Equations  for  the  DTS/DTO  Model 

In  an  earlier  report  [76],  we  considered  fixed-interval  smoothing  in  the  context  of  chaos. 
In  this  thesis,  we  consider  fixed-lag  smoothing  which  we  have  found  to  offer  comparable  if  not 
superior  performance  results  with  chaotic  systems.  There  are  various  methods,  all  equiva¬ 
lent,  for  recursively  computing  the  fixed  lag  estimate  x(n  -  L\n).  The  most  straightforward 
[5]  involves  first  forming  an  extended  state  vector  X (n)  =  {x(n)T ,  x(n  —  1)T,  ,  x(n  — 

L)t]t ,  applying  the  Kalman  filter  equations  to  the  system  obtained  with  this  state  vector, 
and  noting  that  various  submatrices  of  the  error  covariance  matrix  for  the  extended  system 
can  be  updated  separately  with  only  some  of  these  needed  for  recursively  estimating  X(n). 
With  this  approach,  one  not  only  obtains  the  desired,  fixed-lag  estimate  x(n  -  L\n),  but 
the  estimates  x(n  —  i\n)  for  i  —  1,2,  •  •  -L  -  1.  The  resulting  fixed-lag  smoothing  equations 
for  the  estimates  x(n  —  i\n)  for  i  =  1,2, •••L  are  provided  in  Table  3.2.  These  equations 
supplement  the  Kalman  filtering  equations  provided  in  Table  3.1 

3.5  Nonlinear  State  Estimation 

There  has  been  much  research  in  the  past  on  nonlinear  state  estimation,  most  notably 
a  flurry  of  activity  in  the  early  1970’s  involving  both  discrete-time  and  continuous-time 
systems  and  a  second  wave  of  activity  focusing  on  continuous-time  systems  in  the  early 
1980’s.  Until  recently,  nearly  all  the  research  has  dealt  with  noise-driven  state  equations 
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Definitions 

Xi(n\n)  =  x{n  —  i\n)  i=l,---,L  (3.30) 

xo(n\n)  =  x(n\n)  (from  Kalman  filter  equations)  (3.31) 

Pofi(n\n)  =  P(n\n)  (from  Kalman  filter  equations)  (3.32) 

Prediction  Step 

&i(n  +  l|n)  =  Xi-\(n\n)  (3.33) 

PiAn  +  l|n)  =  Pi-\,i-\{n\n)  (3.34) 

Pi  join  +  l|n)  =  Pi-ifi(n\n)Fn  (3.35) 

Measurement  Update  Step 

Xi(n  +  l|n  +  1)  =  Xi{n\n)  + Ki(n+l)[y(n  +  l)~  Hn+ix{n+l\n)]  (3.36) 

Ki(n+1)  =  Pito{n+ l\n)  H^+1  Hn+1  P0,o(»  +  l|n)  Jf£+1  +  R(n  +  1)] 

X-Hn+i  Pf,o{n  +  l|n)  (3.37) 

Pi,i(n  +  l|n  +  1)  =  Pi,i(n  +  l\n)-  Ki(n  +  1)  (3.38) 

Pito(n+  l|n+  1)  =  Pi,o{n  +  l|n)  —  K,-(n  4-  1)  (3.39) 

Initialization 

®t(0|  —  1)  =  0  (3.40) 

P  m(0|  —  1)  =  [0]  (3.41) 

Pi, o(0|  -  1)  =  [0]  (3.42) 


Table  3.2:  The  Fixed-Lag  Smoothing  Equations  for  the  DTS/DTO  Model 
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with  nonchaotic  systems.  Also,  the  emphasis  has  been  on  the  classical,  recursive,  filtering 
problem  in  which  all  parameters  in  the  system  model  axe  known,  and  the  objective  is  to 
estimate  the  state  at  time  n  using  only  the  estimate  at  time  n  —  1  and  the  observation 
at  time  n.  With  few  exceptions,  the  focus  has  been  on  recursive  computation  of  the  a 
posteriori  PDF  p(®(rco)|E)  used  for  MMSE  state  estimation. 

The  fundamental  problem  encountered  with  most  nonlinear  systems  is  that  this  density 
requires  an  infinite  number  of  parameters  (e.g.,  moments)  for  its  specification  at  each  time 
instant.  As  a  result,  optimal,  recursive,  MMSE  state  estimation  with  these  systems  entails 
propagating  an  infinite  set  of  parameters  through  time.  Although  it  is  straightforward  to 
derive  a  recursive  equation,  known  as  the  forward-Kolmogorov  or  Wiener-Hopf  equation, 
for  updating  the  a  posteriori  PDF,  simplifying  approximations  are  needed  for  implementing 
the  equation  in  practical  applications.  Consequently,  the  challenge  for  nonlinear,  state 
estimation  researchers  has  been  and  remains  the  development  of  practical  algorithms  for 
approximating  and  recursively  updating  the  a  posteriori  PDF,  ideally  in  some  optimal  sense. 

The  situation  of  an  infinite  dimensional  a  posteriori  PDF,  encountered  in  many  non¬ 
linear,  state  estimation  problems,  contrasts  markedly  with  that  for  linear,  state  estimation 
problems.  As  discussed  in  the  previous  section,  under  certain  conditions  the  a  posteriori 
PDF  for  linear,  state  estimation  problems  is  Gaussian  and  thus  completely  characterized 
by  two  finite  sets  of  parameters — a  mean  vector  and  a  covariance  matrix — with  the  mean 
vector  being  the  MMSE  state  estimate. 

Historically,  two  broad  classes  of  techniques  for  nonlinear  state  estimation  have  been  pur¬ 
sued:  local  and  global.  Local  techniques  share  the  property  of  using  a  single,  time-varying 
reference  point  to  calculate  quantities,  e.g.,  derivatives,  moments,  or  series  coefficients  at 
each  time  instant.  These  techniques  generally  work  best  when  the  a  posteriori  PDF  is 
nearly  Gaussian  or  at  least  unimodal.  Included  among  these  techniques  are  those  which 
approximate  and  propagate  a  finite  set  of  low-order  moments  of  the  a  posteriori  PDF,  as 
well  as  series  expansion  techniques  in  which  the  a  posteriori  PDF  is  represented  by  a  series 
expansion  and  a  finite  number  of  coefficients  propagated  through  time  [57,  83,  84]. 

The  most  popular  local  technique  and  indeed  the  most  popular  of  all  techniques  for 
nonlinear  state  estimation  has  been  the  extended  Kalman  filter  (EKF).  Its  popularity  is 
primarily  due  to  its  simplicity,  ease  of  implementation,  and  surprisingly  good  performance 
with  many  nonlinear  systems.  In  the  next  chapter,  we  show  that  the  EKF  can  be  incorpo- 
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rated  in  a  state  estimation  algorithm  for  chaotic  systems,  which  is  potentially  effective  even 
when  the  system  dynamics  are  unknown. 

The  unifying  element  of  global  techniques  is  the  numerical  evaluation  of  the  a  posteriori 
PDF  with  the  use  of  a  possibly  time-varying,  finite  grid  of  points  or  regions  at  which 
calculations  and  approximations  are  made  [83,  84].  Many  global  techniques  emerged  during 
the  early  1970’s.  Among  the  most  popular  was  the  Gaussian  sum  approach  of  Allspach 
and  Sorenson  [4],  which  essentially  consisted  of  a  combination  of  extended  Kalman  filters 
operating  in  parallel.  In  addition,  Bucy  and  Senne  [15]  developed  and  refined  a  point- 
mass  approach  which  entailed  approximating  the  a  posteriori  density  with  a  finite  set  of 
impulses  or  point  masses  defined  on  a  time-varying,  finite  grid.  Other  global  techniques 
involved  the  use  of  Hermite  expansions  and  Gauss-Hermite  integration  [36].  The  first  use 
of  dynamic  programming  for  nonlinear  state  estimation,  an  intrinsic  element  of  hidden 
Markov  modeling  approaches,  occurred  in  1966  with  Larson’s  modal  trajectory  approach 
[51].  The  use  of  approximation  techniques  for  nonlinear  state  estimation  with  continuous¬ 
time  systems  has  been  a  more  recent  development.  Kushner  [47,  48]  pioneered  notable  work 
on  the  continuous-time  problem  with  his  use  of  interpolated,  finite-state  Markov  chains  for 
approximating  the  continuous-time  Wiener-Hopf  equation,  a  nonlinear,  partial  differential 
equation,  to  propagate  the  a  posteriori  PDF  through  time. 


3.6  State  Estimation  and  Chaos 

Several  state  estimation  algorithms  for  chaotic  systems  have  emerged  in  the  past  few  years, 
primarily  from  the  physics  community.  All  of  the  algorithms  are  heuristic,  with  most 
exploiting  known  topological  or  ergodic  properties  of  chaotic  systems.  One  limitation  of 
most  of  these  algorithms,  particularly  those  which  exploit  topological  properties,  is  that 
they  are  effective  only  with  large  input  signal-to-noise  ratios  (SNRs),  typically  those  well 
in  excess  of  20  dB.  In  this  section,  we  highlight  those  techniques  of  particular  relevance  to 
this  thesis  because  of  either  their  probabilistic  or  signal  processing  foundations. 

In  contrast  to  most  other  research  on  nonlinear  state  estimation,  the  research  involving 
chaotic  systems  has  focused  on  deterministic  state  equations,  i.e.,  the  absence  of  driving 
noise,  fixed-interval  smoothing,  and  scenarios  involving  unknown  system  dynamics.  In 
particular,  three,  general  problem  scenarios  have  been  investigated: 
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1.  State  estimation  when  the  system  dynamics  are  known,  i.e.,  the  chaotic  map  f  in 
(3.5)  or  chaotic  flow  F  in  (3.7)  is  known. 

2.  State  estimation  when  the  system  dynamics  are  unknown  but  a  clean  reference  orbit 
is  available.  That  is,  in  addition  to  an  observation  set  Y  =  {y(z)},  one  Is  given  an 
orbit  segment,  to  which  no  noise  has  been  added,  which  has  been  generated  by  the 
same  chaotic  system  but  with  a  different  initial  condition. 

3.  State  estimation  when  the  system  dynamics  are  unknown  and  only  the  observation 
set  is  available  for  performing  state  estimation. 

In  addition,  much  of  the  state  estimation  work  involving  the  second  and  third  sce¬ 
narios  above  has  dealt  with  the  use  of  embedding  techniques  [21]  for  reconstructing  the 
system  dynamics.  With  the  most  popular  of  these  techniques,  one  assumes  that  only 

a  single  component  xt(f)  of  the  .A/'-dimensional  state  vector  x(t)  is  available.  With  the 

• 

scalar  observations  x»(f),  one  creates  vectors  z(t)  which  consist  of  time  delayed  samples 
of  {x{(t)}.  Specifically,  one  chooses  constants  T),  T2,  •  •  • , Tm-\i  and  defines  the  vectors 
z(t )  =  [x,(f),  Xi(t  -  Ti),  •  •  -  Tm-i)]T •  Usually,  the  Tt-  are  chosen  to  be  equally 

spaced,  so  that  Ti  =  ir  for  some  constant  r.  If  r  and  M  are  properly  chosen,  the  attractor 
of  the  new  system  implicitly  determined  by  z(t )  will  be  an  embedding  of  the  original  at¬ 
tractor.  Equivalently,  if  g  denotes  the  system  implicitly  defined  by  z(t),  then  there  will  be 
a  diffeomorphism  h  such  that  g  =  ho  f. 

Although  embedding  techniques  have  received  much  attention  in  recent  years,  they  are 
still  poorly  understood.  More  importantly,  common  rules  of  thumb  for  choosing  delays  and 
dimensions  of  the  new  state  vectors  may  not  hold  in  the  context  of  state  estimation.  The 
problem  of  state  estimation  with  chaotic  systems  involving  the  original  state  vectors  is  a 
challenging  problem  unto  itself,  and  some  of  the  fundamental,  poorly  understood  aspects 
of  this  problem  may  be  obscured  if  embedding  considerations  are  incorporated  into  the 
problem.  In  light  of  this,  although  the  state  estimation  algorithms  introduced  in  the  next 
chapter  are  applicable  to  systems  derived  by  embedding,  we  do  not  deal  with  embedded 
systems  in  this  thesis. 

One  of  the  earliest  reported  state  estimation  algorithms  for  chaos  is  an  iterative,  approx¬ 
imate  ML  approach  [22].  As  noted  in  Section  3.3,  if  Y  =  Y(M ,  N )  =  eac^  n°ise 

term  v(n )  has  zero  mean  and  covariance  matrix  R,  and  f  is  invertible,  the  log-likelihood 
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function,  log p(Y(M,  N);x(n0))  for  the  restricted  DTS/DTO  scenario  is  given  by 


log p(Y(M,  N);  x{n0))  =  C(M,  N) 

-if;  [»(*)  -  ft(/,“"0(*(no)))]r^_1  [y(0  -  fe(/i-no(*(no)))]  ,  (3.43) 

2  i=M  1 

where  C(M,  N)  is  a  normalizing  constant.  When  h  is  the  identity  operator  and  R=  cr2  Iff, 
where  a  is  a  real- valued  constant  and  Iff  is  the  (M  X  ,A/>identity  matrix,  this  reduces  to 

1  N  .  ||2 

log  p{Y(M,Ny,x(n0))  =  C(M,N)  p(i)  -  f  no(*(no))||  •  (3-44) 

i=M 

In  [22],  each  difference  y(i)  -  f~no(x(n0))  in  the  above  expression  is  approximated  by  the 
first  term  in  a  Taylor  series  expansion 

y(*V/i"Bo(*(»0))  =  /*'-"0(/-(i-no)(y(i)))-/,'-no(*(»  „))  (3-45) 

»  I?{/i-Tl0(*(no))}(r(,_n0)(2/(*))-*(”o))  (3-46) 

where  r>{/i_"°(a:(no))}  is  the  derviative  of  fl~n°  evaluated  at  x(ra0).  With  these  approx¬ 
imations  log p(Y(M,  iV);  ai(uo))  reduces  to 


\ogp(Y{M,Ny,x(n0))  =  C(M,N) 

-  5^El|i>Wi-"'>W>!o))}(r|i-w)(!/(0)-‘«(»o))llJ  (3-47) 

2cr  i=M 


which  is  linear  in  the  unknown  state  x(n0)  if  D{fl~n°(x(n 0))}  is  independent  of  x(n0), 
which  in  general  it  is  not.  By  assuming  D{fl~n°(x(no))}  is  in  fact  independent  of  a;(no), 
one  can  obtain  a  closed-form  expression  for  the  ML  estimate  of  x(no)  by  differentiating 
(3.47)  and  solving  for  a;(no).  Doing  so  yields 


£c(n0) 


•  N 

J2  DT{fi~n°  (®(n0))}D{/-no  (*(no))} 

.i-M 

x  £  [z?T{r-”0(a;(no))}I?{r'-no(®(no))}/-(i"no)(y(*))]  (3-48) 

i=M  1 


where  DT  (x(no))}  denotes  the  transpose  of  D{fl  n°(x(no))}-  The  iterative  ap- 
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proach  used  in  [22]  consists  of  using  the  current  estimate  of  x(n0)  to  calculate  the  deriva¬ 
tive  matrices  JD{/,-n°(®(no))}  and  then  evaluating  the  above  expression  to  obtain  the  next 
estimate  of  sc(no).  As  suggested  in  [22],  this  approximate  ML  state  estimation  approach 
is  only  effective  if  the  approximated  differences  are  small,  or  equivalently  if  the  SNR  is 
large.  In  Chapter  7,  we  introduce  an  estimator  for  a  special  class  of  piecewise  linear  maps 
for  which  an  expression  similar  to  (3.48)  provides  the  exact  ML  state  estimate.  For  this 
estimator,  the  appropriate  derivatives  D  {f*~n°  (x(no))}  are  separately  determined  with  a 
technique  that  uses  hidden  Markov  modeling  and  Viterbi  decoding. 

Because  of  the  form  of  the  log-likelihood  function  given  by  (3.43),  one  can  treat  this 
function  as  an  energy  function  and  pose  the  problem  of  ML  state  estimation  as  a  constrained 
optimization  problem.  Note  that  this  form  arises  because  of  the  additive,  white,  Gaussian 
assumption  on  the  observation  noise  and  the  assumption  of  zero  driving  noise.  In  particular, 
(3.43)  can  be  expressed 

log  ViX {M,  N);  ®(n0))  =  C(M,N) 

~\j2  l»(0  ~  MZ(0)]T  -K"1  b(0  -  M*(*))]  +  52  ai  [*(*)  ~  /(z(*  ~  ^K3-49) 

i=M  i=M+ 1 

where  the  {z(i)}  are  the  unknowns  and  the  {a,}  are  Lagrange  multipliers.  For  many  chaotic 
maps,  the  chaotic  nature  of  the  dynamics  precludes  straightforward  minimization  of  this 
cost  function  using  standard  optimization  techniques. 

A  cost  function  similar  in  appearance  to  the  one  above,  but  fundamentally  different 
and  easier  to  minimize,  is  considered  by  Kostelich  and  Yorke  in  [45].  In  the  Kostelich 
and  Yorke  cost  function,  the  {<*,•}  are  treated  as  known,  weighting  constants  rather  than 
Lagrange  multipliers.  Treating  the  a,-  in  this  way  transforms  the  problem  from  a  constrained 
minimization  problem  to  a  regularization  problem  with  some  similarity  to  those  that  arise 
in  many  image  understanding  applications  [8].  In  [45],  an  iterative,  least  squares  approach 
is  used  to  minimize  the  resulting  cost  function  and  simultaneously  estimate  the  system 
dynamics. 

An  alternative  approach  to  fixed-interval  smoothing  with  chaotic  maps  is  considered 
in  [23,  35].  The  technique  discussed  in  [35]  is  applicable  to  two-dimensional,  invertible 
chaotic  maps  with  known  system  dynamics.  The  technique  in  [23]  is  similar  but  does  not 
require  knowledge  of  the  system  dynamics  and  instead  uses  a  locally  linear  estimate  of  the 
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dynamics.  The  purported  motivation  of  both  approaches  is  the  Shadowing  Lemma  [10,  11], 
a  lemma  which  applies  to  a  certain  class  of  well- understood,  nondissipative,  chaotic  systems 
known  as  Anosov  diffeomorphisms. 

The  state  estimation  algorithms  presented  in  [23,  35]  are  iterative  algorithms  in  which 
one  starts  with  a  set  of  noise- corrupted  observations  y(0,A)  for  the  restricted  DTS/DTO 
scenario,  with  the  observation  function  h  in  (3.6)  given  by  the  identity  operator.  For  each 
iteration,  one  first  estimates  the  stable  and  unstable  manifolds  [21]  of  the  undriven  system 
associated  with  the  state  x(n0)  at  each  time  no  using  the  current  state  estimates.  Next, 
two  stable  autoregressive  processes,  one  running  forward  in  time  along  the  estimated  stable 
manifold,  the  other  running  backward  in  time  along  the  estimated  unstable  manifold,  are 
used  to  propagate  correction  terms,  with  a  pair  of  correction  terms  associated  with  the 
state  at  each  time  no.  The  pair  of  correction  terms  associated  with  the  state  at  a  given  time 
are  then  added  to  the  current  estimate  of  the  state  yielding  a  new  estimate.  The  overall 
objective  of  updating  the  states  estimates  is  not  to  reduce  the  estimation  error  but  to  ensure 
that  the  new  estimates  more  closely  obey  the  system  dynamics  than  the  previous  estimates. 
That  is,  if  /  denotes  the  system,  xnew(no)  the  new  state  estimate  at  time  no,  and  x0id{no) 
the  previous  estimate,  then  the  correction  terms  for  time  no  are  chosen  with  the  goal  that 
||*netu(rco)  -  f(&new(n o  -  1))||  be  less  than  |j®oiel(no)  -  f(x0ld(no  -  1))||.  Although  not 
discussed  in  [23,  35],  an  inherent  assumption  in  the  algorithms  is  that  the  initial  noise  term 
n(0)  has  no  component  in  the  direction  of  the  stable  manifold  of  *(0)  and  the  final  noise 
term  v(N )  has  no  component  in  the  direction  of  the  unstable  manifold  of  x(N). 

Because  the  local  stable  and  unstable  manifolds  of  a  chaotic  orbit  and  the  expansion  rates 
associated  with  these  manifolds  axe  only  applicable  to  infinitesimally  small  perturbations, 
the  algorithms  are  potentially  useful  only  when  the  initial  SNR  is  large.  In  fact,  experimental 
results  suggest  that  the  algorithms  perform  poorly  below  20  dB,  and  even  with  large  SNRs 
the  algorithms  often  behave  erratically. 

As  is  shown  in  Chapters  4  and  5,  the  existence  of  both  positive  and  negative  Lyapunov 
exponents  for  dissipative,  chaotic  diffeomorphisms  has  a  profound  impact  on  achievable  state 
estimator  performance  with  these  systems.  In  addition,  the  existence  of  these  exponents 
and  the  associated  stable  and  unstable  manifolds  is  necessarily  a  fundamental  consideration 
when  developing  practical,  effective,  probabilistic  state  estimators  for  dissipative  chaotic 
systems. 
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Hidden  Markov  modeling  approaches  to  state  estimation  with  chaotic  maps  are  presented 
in  [56,  64].  With  both  approaches,  the  dynamics  of  the  underlying,  deterministic,  chaotic 
system  are  approximated  with  a  finite  state  hidden  Markov  model  [74,  75]  with  the  output  of 
each  state  a  constant  vector  in  [56]  and  a  Gaussian  random  vector  in  [64].  Neither  approach 
requires  knowledge  of  the  actual  system  dynamics;  but,  both  require  availability  of  a  noise- 
free  reference  orbit  for  estimating  transition  probabilities  and  output  distributions. 

The  approach  discussed  in  [56]  is  an  iterative,  fixed-interval,  smoothing  approach  and 
applicable  only  with  bounded,  uniformly  distributed  observation  noise.  At  each  iteration,  a 
local  partitioning  of  state  space  is  first  performed  around  the  current  estimate  of  each  orbit 
point.  Each  partition  element  corresponds  to  a  state  of  the  assumed  underlying  Markov 
chain.  Next,  a  noise-free  reference  orbit  is  used  to  estimate  transition  probabilities  between 
the  states  associated  with  one  orbit  point  and  those  associated  with  the  next.  Finally, 
the  Viterbi  algorithm  is  used  to  determine  the  most  likely  state  sequence  (actually  the 
centroids  of  the  states)  as  determined  by  the  noisy  observations,  transition  probabilities, 
and  assumption  on  the  observation  noise. 

The  approach  in  [64]  uses  a  single  set  of  states  for  all  noisy  orbit  points.  These  states  do 
not  correspond  to  a  partioning  of  the  phase  space.  Instead,  they  correspond  to  a  modeling 
of  the  dynamics  of  the  chaotic  system  as  a  finite-state,  first-order  hidden  Markov  model, 
in  which  the  noise-free  output  corresponding  to  each  state  is  a  Gaussian  random  vector. 
The  state  transition  probabilities  as  well  as  the  mean  vectors  and  covariance  matrices  of 
the  states  are  estimated  from  the  clean  reference  orbit  using  the  Baum- Welch  re-estimation 
formula.  Each  observed,  noisy,  orbit  point  is  modeled  as  the  sum  of  the  output  of  a  state  of 
the  underlying  hidden  Markov  model  and  an  independent,  random  vector  which  represents 
the  contribution  of  the  observation  noise.  Heuristic  ML  and  MMSE  state  estimators  based 
on  this  model  are  introduced  in  [64].  The  estimators  perform  reasonably  well  with  small 
to  moderate  input  SNRs,  with  performance  for  larger  input  SNRs  strongly  dependent  on 
the  number  of  states.  The  MMSE  state  estimator  would  be  labeled  a  global  technique  with 
the  taxonomy  of  nonlinear,  suboptimal,  MMSE  state  estimators  discussed  earlier  in  the 
chapter. 

In  Chapter  6,  optimal  and  suboptimal  detection  and  estimation  algorithms  based  on 
hidden  Markov  modeling  are  introduced  for  a  special  class  of  one-dimensional,  chaotic  maps. 
The  appropriateness  and  value  of  using  HMMs  in  conjunction  with  these  maps  arises  from 
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the  fact  that  these  maps  give  rise  to  homogeneous,  finite-state  Markov  chains. 
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Chapter  4 


State  Estimation  with  Dissipative, 
Chaotic  Maps 

4.1  Introduction 


In  this  chapter,  we  consider  probabilistic  state  estimation  with  chaotic  maps  and  intro¬ 
duce  practical,  suboptimal  Maximum- Likelihood  (ML)  and  Minimum-Mean-Squared-Error 
(MMSE)  state  estimators.  We  begin  by  discussing  the  experimentally  observed  properties 
of  the  likelihood  function  for  the  restricted  DTS/DTO  scenario  given  by  (3.5)  and  (3.6)  for 
the  special  case  in  which  h  is  the  identity  operator  and  /  is  a  two-dimensional,  dissipative, 
chaotic  map.  We  offer  a  nonrigorous  explanation  of  these  properties  which  is  supported  by 
the  performance  bound  analysis  in  Chapter  5.  We  also  introduce  a  simple,  grid-based,  ML 
estimator,  elements  of  which  are  incorporated  in  the  MMSE  estimators  introduced  later  in 
the  chapter. 

The  chapter  then  focuses  on  MMSE  state  estimation  and  introduces  two  heuristic  MMSE 
state  estimators — one,  a  local  estimator  based  on  extended  Kalman  filtering,  and  the  other, 
a  global  estimator  that  approximates  the  conditional  mean  integral  given  by  (3.19)  with 
a  recursively  calculable,  finite  summation.  Performance  results  for  both  estimators  are 
presented  and  compared.  The  results  suggest  that  the  local  approach  is  more  effective  with 
larger  input  SNRs  and  the  global  approach  more  effective  with  smaller  input  SNRs. 

In  this  chapter,  our  focus  is  on  problem  scenarios  in  which  the  input  signal-to-noise  ratio 
(SNR)  is  smaller  than  20  dB,  a  heretofore  relatively  unexplored  problem  in  the  context  of 
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chaos,  but  one  that  arises  in  real-world  applications  such  as  secure  communication,  poten¬ 
tially  involving  chaotic  systems.  All  three  problem  scenarios  listed  in  Section  3.6 — known 
system  dynamics,  unknown  system  dynamics  and  availability  of  a  noise-free  reference  orbit, 
unknown  system  dynamics  and  no  availability  of  a  noise-free  reference  orbit — are  dealt  with 
in  this  chapter,  with  emphasis  placed  on  the  second  scenario.  Whereas  there  are  presently 
few  practical  applications  involving  chaos,  it  is  unclear  which  of  these  scenarios  has  the 
most  practical  relevance. 

4.2  ML  State  Estimation 

In  this  section,  we  consider  ML  state  estimation  with  dissipative,  chaotic  diffeomorphisms. 
Although  two  specific  maps — the  Henon  and  Ikeda  maps — are  emphasized,  many  of  the 
results  apply  to  other  dissipative,  chaotic  maps  as  well,  even  those  with  state  vector  di¬ 
mensions  greater  than  two.  The  section  first  qualitatively  analyzes  the  properties  of  the 
likelihood  function  that  arises  with  the  DTS/DTO  scenario,  in  part  to  motivate  the  approx¬ 
imate  ML  state  estimator  introduced  later  in  the  section.  The  specific  problem  considered 
is  that  of  estimating  a:(no),  the  state  of  the  dissipative,  chaotic  map  /  at  time  no,  given 
the  observation  set  Y  =  Y(M,N )  =  {y{i)}f-M  f°r  the  restricted  DTS/DTO  scenario  with 
state  and  observation  equations  (3.5)  and  (3.6),  respectively. 

4.2.1  Properties  of  the  Likelihood  Function 

As  shown  in  Chapter  3,  the  log-likelihood  function,  log  p(Y;x(no)),  has  the  following  form 
for  the  restricted  DTS/DTO  scenario: 

\ogp(Y;x(n0))  =  C(M,N) 

£  [»(0  -  h{f-^{x{n0)))]T  R-1  [y(i)  -  h(f-n°(x(n0)))\  ,  (4.1) 

t'=M 

where  C(M ,  N)  is  a  normalizing  constant  and  R  is  the  covariance  matrix  of  the  observation 
noise.  Note  that  (4.1)  is  a  sum  of  weighted,  squared-error  terms  in  which  each  error  is 
the  difference  between  an  observation  vector  y{i)  and  the  corresponding  transformed  state 
vector  ai(no),  and  the  weighting  matrix  is  the  inverse  of  the  covariance  matrix  of  the  ob¬ 
servation  noise.  Whereas  the  likelihood  function  has  a  fundamental  role  in  probabilistic 
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state  estimation,  an  understanding  of  the  basic  properties  of  this  function  for  a  given  state 
estimation  problem  is  useful  when  developing  practical  state  estimators  for  that  problem. 
With  this  in  mind,  we  now  provide  a  nonrigorous  analysis  supported  by  experimental  results 
of  the  properties  of  p(Y ;  ic(no))  for  the  special  case  in  which  h  is  the  identity  operator  and 
R  is  a  diagonal,  positive  definite  matrix  with  diagonal  elements  where  JV  is  the 

dimension  of  x(no)-  Although  the  simulations  involve  only  two-dimensional  maps,  the  prop¬ 
erties  apply  to  higher-dimensional  maps  as  well.  Both  the  nonrigorous  analysis  presented 
here  and  the  performance  bound  analysis  presented  in  Chapter  5  suggest  a  fundamental 
influence  of  the  system  Lyapunov  exponents  on  the  likelihood  function  for  chaotic  systems. 

We  first  consider  the  case  in  which  M  =  riQ  —  m  and  N  =  no  in  (4.1)  for  some  positive  in¬ 
teger  m,  which  corresponds  to  all  observations  occurring  before  or  at  the  time  of  interest  no . 
Figures  4-1  (a)  and  (b)  are  contour  plots  of  the  likelihood  function  p(Y(no  —  10,  no);  x(no)) 
for  a  fixed  time  no  for  the  Henon  and  Ikeda  maps  with  an  input  SNR  of  6  dB.  The  figures 


Figure  4-1:  Contour  plot  of  p(Y(n o  —  10,  no);  x(rao))  as  a  function  of  ®(n0)  = 
[xi(no),X2(no)]'r.  (a)  Henon  map;  (b)  Ikeda  map. 

depict  the  relative  values  of  the  likelihood  function  p(Y(no  —  10,  no);  cc(no))  as  a  function  of 
x(n0)  for  a  fixed  set  of  observations.  In  other  words,  a  set  of  observations  Y (n0— 10,  no)  was 
generated  using  the  orbit  segment  {*(*)}"=7i0-io  *(«o)  =  xn0  ■  The  likelihood  function 

was  then  evaluated  for  various  values  of  x(n0)  using  this  set  of  observations,  and  the  rel¬ 
ative  likelihood  values  shown  in  the  figures.  The  center  point  in  each  figure  is  the  relative 
likelihood  value  corresponding  to  xnQ  and  the  other  values  are  those  for  a  rectangular  grid 
of  points  (i.e.,  values  of  ac(n0))  centered  at  xno  with  a  grid  spacing  of  .002.  In  the  figures, 
the  nesting  of  contours  indicates  increasing  values  of  the  likelihood  function.  Alternative 
graphical  representations  of  the  data  are  provided  in  Figures  4-2  (a)  and  (b)  which  are  mesh 
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plots  of  the  same  data  used  for  Figures  4-1  (a)  and  (b),  respectively.  Whereas  ML  state 


Figure  4-2:  Mesh  plot  of  p(Y(no  —  10,  no);  ®(»o))  as  a  function  of  x(no)  +  [xi(no),  £2(^0 )]T- 
(a)  Henon  map;  (b)  Ikeda  map. 


estimation  would  entail  choosing  the  value  of  x{uq)  for  which  p(Y(na  —  10,  no);  x(n0))  is 
largest,  the  ridge-like  property  of  the  likelihood  function  in  the  figures  suggest  that  the  ML 
estimate  of  xno ,  the  actual  value  of  a:(rao),  may  be  a  very  poor  estimate  for  both  maps.  The 
results  are  similar  with  observations  for  times  at  or  after  n0.  Figures  4-3  (a)  and  (b)  are 
contour  plots  of  p(Y(no,no  +  15);  x(n0))  as  a  function  of  x(tiq)  for  the  Henon  and  Ikeda 
maps. 


Figure  4-3:  Contour  plot  of  p(Y (no,  no  + 15);  ®(«o))  as  a  function  of  x(no):  (a)  Henon  map; 
(b)  Ikeda  map. 


A  nonrigorous,  first-order  analysis  helps  explain  this  interesting  property  of  the  likeli¬ 
hood  function.  Consider  a  single  term  of  the  sum  in  (4.1): 

Si(x(n))  =  \y(i)  -  /^(*(no))]T R'1  [»(*)  -  /''-no(®(no))]  .  (4.2) 
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For  a  small  deviation  6  from  the  actual  value  of  *(n0),  the  Mowing  relation  holds  to 
first-order: 

/i-no(x(n0)  +  6)  »  f~n°(a:(no))  +  D{f-n°(x(n0))}6,  (4.3) 

where  D{f~n°(x(n  „))}  denotes  the  derivative  of  /i_no  with  respect  to  x(n0).  Substituting 
JC(no)  +  s  for  as(no)  in  (4.2)  and  also  substituting  (4.3)  in  (4.2)  yields 

Si(x(n0)  +  6)  «  \y(i)  -  /"no(x(n0))  - 

xR-1  [»(*)  -  f-no(x(n0))  -  D{f-no(x(no))}s]  ,  (4.4) 


which  because  of  the  diagonal  assumption  on  R  reduces  to 


*{vk(i)-  fl>{/,-no(*(«o))}«]fc}2 

Si(x(n0 )  +  S)  «  53  ~ 


k= 1 


(4.5) 


where  t>*(t)  denotes  the  kth  element  of  the  noise  vector  u(i)  and  \p{fl  n°(x(no))}  6 
denotes  the  kth  element  of  the  vector  D{/’-n°(*(«o))}  6. 

Since  v(i )  is  zero-mean,  the  expected  value  of  5,(x(no)  +  d)  is  given  by 


k 


E  {5i(x(no)  +  £)}  ~ 


E  — 

k= 1 


[p{/(«-"Q>(g(no))}g] 


(4.6) 


where  E  is  the  expectation  operator.  The  log-likelihood  function  and  consequently  the 
likelihood  function  is  large  if  each  term  in  the  sum  is  small.  As  such,  the  value  of  the 
likelihood  function  for  the  perturbed  state  sc(n0)  +  S  depends  on  the  magnitudes  of  the 

vectors 

D{f-no(x(n0))}6,  i  =  M,  -  ■  ■  ,N.  (4-7) 


As  discussed  in  Chapter  2,  an  A/'-dimensional  chaotic  map  has  Af  Lyapunov  exponents, 

and  associated  with  each  exponent  and  point  on  the  attractor  is  a  linear  subspace  of  for 

which  the  logarithm  of  the  long-term,  averaged  growth  rate  of  infinitesimal  perturbations 

along  most  of  that  subspace  is  given  by  the  Lyapunov  exponent.  More  precisely,  if  Ai  > 

A2  >  ••■  >  Aat  denote  the  ordered  Lyapunov  exponents  not  repeated  by  multiplicity  and 

E3.  ,  denotes  the  subspace  of  R*'  associated  with  x(n0)  and  all  Lyapunov  exponents  less 
*(no) 
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than  or  equal  to  Xj,  then  the  following  holds  [21]: 

nhjnoilog||D{r(x(n0»}d||  =  A  j  (4.8) 

for  any  unit  vector  6  €  £*(no)  \  where  ||  •  ||  denotes  the  Euclidean  norm  and  EJX^  \ 

denotes  the  set  formed  by  taking  Ex(no)  an(^  removing  the  subspace  Ex^q )  • 

As  mentioned  in  Chapter  2,  a  frequently  used  but  often  poor  approximation  is  that  if  6 
is  a  small  perturbation,  not  necessarily  infinitesimally  small,  and  S  €  Ex^  \  Ej^^  then 
||£>{/fc(®(n0))}  d|[  ~  ||£||  exp(kXj).  Although  this  approximation  is  often  poor,  its  implica¬ 
tions  often  are  nonetheless  valid  because  of  the  close  relation  between  each  linear  subspace 
Ex(no)  and  a  nonlinear  counterpart  known  as  a  differentiable  manifold.  In  particular,  tan¬ 
gent  to  each  of  the  linear  subspaces  Ex^  corresponding  to  a  negative  Lyapunov  exponent  is 
a  nonlinear  differentiable  manifold,  with  the  orbits  of  points  along  the  manifold  exhibiting 
the  expected,  scaling  behavior.  That  is,  if  is  the  nonlinear  manifold  associated  with 

the  linear  manifold  £^noj,  then  for  any  point  a;(no)  +  £  €  W^ny  the  following  is  true  [21]: 

||/fc(a;(n))  -  fk{x{n0)  +  £)||  <  Cexp(kXj).  (4.9) 

(These  nonlinear  manifolds  are  defined  only  for  negative  Lyapunov  exponents). 

One  implication  of  this  scaling  behavior  along  these  nonlinear  manifolds  is  that  the 
magnitudes  of  the  vectors  given  by  (4.7)  are  smallest  for  perturbations  6  along  the  nonlinear 
manifold  associated  with  xno  corresponding  to  the  smallest  Lyapunov  exponent  of  /  if 
i  -  no  >  0  and  the  smallest  Lyapunov  exponent  of  /-1  if  i  —  no  <  0.  Note  that  the 
Lyapunov  exponents  of  /-1  are  the  negatives  of  the  Lyapunov  exponents  of  f. 

This  implication  suggests  that  the  ridge-like  regions  of  large  likelihood  values  in  Figures 
4-1  (a)  and  (b)  correspond  to  the  nonlinear  manifold  associated  with  the  smallest  Lyapunov 
exponent  of  /-1 ,  and  the  ridge-like  regions  of  large  likelihood  values  in  Figures  4-3  (a)  and 
(b)  correspond  to  the  nonlinear  manifold  associated  with  the  smallest  Lyapunov  exponent 
of  f.  The  reasoning  behind  this  conclusion  is  the  following.  The  results  depicted  in  Figures 
4-1  (a)  and  (b)  are  those  for  the  case  in  which  all  observations  occur  before  or  at  the  time  of 
interest  so  that  i  —  no  <  0  in  (4.6).  Thus,  the  magnitudes  of  the  vectors  given  by  (4.7)  are 
smallest  for  perturbed  state  vectors  xno  +6  along  the  nonlinear  manifold  associated  with  xno 
corresponding  to  the  smallest  Lyapunov  exponent  of  /-1.  Similarly,  the  results  depicted  in 
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Figures  4-3  (a)  and  (b)  are  those  for  the  case  in  which  all  observations  occur  at  or  after  the 
time  of  interest  so  that  i  —  no  >  0  in  (4.6).  Thus,  the  magnitudes  of  the  vectors  given  by 
(4.7)  are  smallest  for  perturbed  state  vectors  along  the  nonlinear  manifold  associated  with 
xno  corresponding  to  the  smallest  Lyapunov  exponent  of  /.  Since  the  likelihood  function 
is  largest  at  points  x(no)  for  which  the  vectors  in  (4.7)  have  the  smallest  magnitudes,  the 
implication  is  that  the  ridge-like  regions  in  the  figures  correspond  to  the  nonlinear  manifolds 
associated  with  xno  corresponding  to  these  smallest  Lyapunov  exponents. 

If  correct,  this  conclusion  applies  to  higher-dimensional  chaotic  systems  as  well.  How¬ 
ever,  in  contrast  to  two-dimensional,  dissipative,  chaotic  diffeomorphisms  which  always  have 
one  positive  and  one  negative  Lyapunov  exponent,  higher- dimensional,  dissipative,  chaotic 
diffeomorphisms  may  have  several  negative  and/or  several  positive  Lyapunov  exponents. 
As  a  result,  for  higher-dimensional  systems  if  other  Lyapunov  exponents  are  comparable  in 
size  to  the  smallest  exponent,  then  the  likelihood  function  corresponding  to  observations  at 
times  before  or  at  no  might  be  large  for  values  of  a?(no)  for  a  higher-dimensional  manifold 
associated  with  xno .  A  similar  result  applies  if  other  Lyapunov  exponents  are  comparable 
in  size  to  the  largest  Lyapunov  exponent  and  all  observations  are  at  times  at  or  after  no. 

The  question  arises  as  to  the  behavior  of  the  likelihood  function  which  includes  obser¬ 
vations  at  times  both  before  and  after  no.  Figures  4-4  (a)  and  (b)  are  contour  plots  of  this 
likelihood  function  p(Y(M,  A);  x(tiq))  for  the  Henon  map  with  two  sets  of  values  for  M 
and  N,  satisfying  M  <  n  <  N  and  Figures  4-5  (a)  and  (b)  are  analogous  contour  plots  for 
the  Ikeda  map.  As  with  the  earlier  contour  plots,  the  figures  depict  the  likelihood  function 


Figure  4-4:  Contour  plot  of  p(Y(M,  N);  x(uq))  as  a  function  of  a: (no)  for  the  Henon  map. 
(a)  M  =  no  —  3  and  N  =  n0  +  12;  (b)  M  =  no  -  5  and  N  =  n0  +  20. 

evaluated  at  a  a  rectangular  grid  of  values  for  x(no)  centered  at  xng  with  a  fixed  observation 
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(a) 


(b) 


x1  (n0) 

Figure  4-5:  Contour  plot  of  p(Y(M,  N);  x(no))  as  a  function  of  x(no)  for  the  Ikeda  map. 
(a)  M  =  no  —  3  and  N  =  no  +  15;  (b)  M  =  no  —  6  and  N  =  no  +  25. 

set  Y  generated  with  xno .  As  indicated  by  the  figures,  the  likelihood  function  is  multimodal 
with  only  a  few  observations,  but  rapidly  becomes  impulse-like  as  the  number  of  past  and 
future  observations  increases. 

Figures  4-6  (a)  and  (b)  depict  analogous  results  for  a  slightly  shifted  rectangular  grid  of 
values  for  x(no).  In  particular,  the  grid  used  for  the  figures  was  centered  at  the  perturbed 
state  x(n)  +  [1.5  x  10~4, 1.5  x  10-4]r  and  xno  was  not  a  grid  point.  The  grid  spacing  in  both 
figures  as  in  the  earlier  figures  is  .002.  A  comparison  of  Figures  4-4  (b)  and  4-6  (a)  and  a 


Figure  4-6:  Contour  plot  ofp(Y(M,  N );  x(n<$))  as  a  function  of  x(n0)  with  value  for  ccno  not 
shown,  (a)  Henon  map  with  M  =  no  —  5  and  N  =  no  +  20;  (b)  Ikeda  map  with  M  =  no  —  6 
and  N  —  n0  +  25 

comparison  of  Figures  4-5  (b)  and  4-6  (b)  reveals  an  extreme  sensitivity  of  the  likelihood 
function  to  the  grid  spacing  when  the  numbers  of  past  and  future  observations  are  not 
small.  This  sensitivity  is  shown  to  have  an  important  influence  on  the  performance  of  an 
approximate  ML  state  estimator  introduced  in  the  next  section. 
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The  impulse-like  nature  of  the  likelihood  function  p(Y(M,  IV); ai(no))  for  invertible,  dis¬ 
sipative,  chaotic  systems  with  observations  at  both  past  and  future  times  suggests  that  the 
ML  estimate  of  xno  becomes  increasingly  accurate  as  the  numbers  of  observations  for  both 
past  and  future  times  increases.  However,  several  practical  considerations  thwart  maximiza¬ 
tion  of  the  likelihood  function.  First,  the  likelihood  function  for  a  dissipative,  chaotic  map 
is  highly  nonlinear  and  as  a  consequence  there  is  generally  no  closed-form  expression  for  the 
ML  estimate  of  xno .  Second,  the  likelihood  function  generally  has  multiple  local  minima 
for  small  numbers  of  observations;  this  property  coupled  with  the  the  impulse-like  nature  of 
the  likelihood  function  for  larger  numbers  of  observations  precludes  the  straightforward  use 
of  standard,  nonlinear,  optimization  techniques  for  numerically  maximizing  the  likelihood 
function.  Third,  the  inverse  of  an  attractor  is  a  repeller.  That  is,  whereas  a  dissipative, 
chaotic  system  /  shrinks  volumes  in  the  basin  of  attraction  of  an  attractor,  the  inverse 
system  /-1  expands  volumes  and  is  thus  inherently  unstable.  Because  of  this  instability, 
the  orbits  of  /-1  generated  by  most  initial  conditions  on  the  attractor  are  unbounded.  As 
a  result,  it  is  generally  difficult  to  obtain  accurate  estimates  of  the  backward  orbit  of  a 
point,  i.e.,  the  orbit  with  the  given  point  as  the  final  condition.  Finally,  the  use  of  standard 
optimization  techniques  requires  knowledge  of  the  system  dynamics  which  may  not  always 
be  available. 

Nonetheless,  potentially  effective,  approximate,  ML  state  estimation  is  possible  with 
dissipative,  chaotic  diffeomorphisms  even  when  the  system  dynamics  are  unknown.  The 
next  section  introduces  a  simple  state  estimator  which  circumvents  the  above-mentioned 
problems  by  exploiting  the  topological  transitivity  property  of  chaotic  systems. 

4.2.2  An  Approximate  ML  State  Estimator 

In  this  section,  we  introduce  an  approach  for  practical,  suboptimal,  ML  state  estimation 
with  dissipative,  chaotic  maps.  We  consider  a  more  general  problem  than  that  of  estimating 
®(n0),  the  state  at  time  no-  In  particular,  we  consider  the  problem  of  estimating  the  (N- 1-1)- 
point  orbit  segment  {/’(®(0))}£Lo  given  the  observation  set  Y(0,JV).  In  theory,  the  two 
problems  are  equivalent  since  the  system  dynamics  are  deterministic.  That  is,  if  £mi.(^o) 
denotes  the  ML  estimate  of  *(n0)  for  a  given  observation  set  Y (0,  A),  then  fn~n°  (xml(ti0)) 
denotes  the  ML  estimate  of  x(n)  for  arbitrary  time  n  for  the  same  observation  set.  However, 
for  the  approximate  ML  approach  considered  in  this  section,  experimental  performance  re- 
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suits  are  generally  superior  when  the  state  is  estimated  separately  at  each  time  n.  This 
appears  to  be  a  consequence  of  the  simultaneous  presence  of  both  positive  and  negative 
Lyapunov  exponents  with  a  dissipative,  chaotic  map  /  which  leads  to  amplification  of  es¬ 
timation  errors  in  state  estimates  when  acted  upon  by  f  or  /-1.  In  other  words,  if  x{n) 
denotes  an  estimate  of  x(n )  with  nonzero,  squared  estimation  error  e2,  then  the  squared 
estimation  errors  for  the  estimates  f(x(n))  and  f~l(x(n))  of  x(n  +  1)  and  x{n  —  1),  re¬ 
spectively,  are  generally  larger  than  e2.  This  difference  between  theory  and  practice  results 
both  from  nonoptimality  of  the  estimator  and  computer  round-off  error.  In  Chapter  7,  we 
introduce  an  optimal  ML  state  estimator  for  a  class  of  expanding,  one-dimensional  maps 
with  a  similar  difference  between  theory  and  practice,  but  with  the  difference  attributable 
in  this  case  solely  to  computer  round-off  error. 

An  underlying  requirement  of  the  estimation  approach  introduced  here  is  that  the  or¬ 
bit  segment  {/’(a;(0))}^o  to  be  estimated  lies  on  a  chaotic  attractor.  This  requirement  is 
assumed  to  hold  in  the  following  discussion.  Also,  unless  stated  otherwise,  the  distance  be¬ 
tween  two  orbit  points  refers  to  the  Euclidean  distance  between  the  points  and  the  distance 
between  orbit  segments  refers  to  the  sum  of  the  Euclidean  distances  between  corresponding 
points  on  the  orbits  divided  by  the  number  of  segment  points. 

One  numerical  method  for  ML  parameter  estimation,  the  method  we  use  here,  is  that 
of  grid  search.  With  this  method,  one  evaluates  the  likelihood  function  at  a  finite  set  of 
parameter  values  and  uses  the  parameter  value  for  which  likelihood  function  is  largest  as  the 
parameter  estimate.  With  respect  to  the  ML  state  estimation  problem  of  interest  here,  the 
parameter  corresponds  to  xno,  the  actual  unknown  state  at  time  no  where  no  €  [0,  A],  An 
important  consideration  with  this  estimation  method  is  the  selection  of  an  appropriate  set  of 
possible  parameter  values,  which  for  the  state  estimation  problem  correspond  to  trial  values 
for  xno.  Complicating  the  task  of  choosing  an  appropriate  set  of  trial  values  for  the  state 
estimation  problem  are  two  practical  problems  noted  earlier  which  arise  with  dissipative, 
chaotic  systems — the  difficulty  in  generating  accurate  backward  orbits  for  points  and  the 
impulse-like  nature  of  the  likelihood  function  with  large  numbers  of  observations  at  past 
and  future  times.  The  first  problem  is  relevant  because  backward  orbit  segments  for  the 
trial  state  values  are  used  in  the  likelihood  function  if  no  ^  0,  as  indicated  by  (4.1).  The 
second  problem  is  relevant  if  N  is  large.  One  simple,  albeit  suboptimal  way  to  circumvent 
this  second  problem  is  to  only  use  observations  for  times  near  no  in  the  likelihood  function. 
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However,  the  first  problem  remains  since  observations  for  times  both  before  and  after  no  are 
needed  in  the  likelihood  function  for  it  to  avoid  the  undesirable  ridge-like  behavior  which 
occurs  when  all  observations  are  for  times  at  or  after  no-  A  desirable  situation  would  be 
one  in  which  an  appropriate  set  of  trial  values  was  available  for  which  the  orbit,  or  at  least 
an  adequately  sized  orbit  segment,  containing  each  of  the  values  was  known  as  well. 

Since  a  chaotic  system  is  topologically  transitive,  most  orbits  on  an  attractor  have  points 
which  come  arbitrarily  close  to  all  other  points  on  the  attractor.  In  light  of  this  and  the 
underlying  requirement  that  the  unknown  orbit  segment  lies  on  the  attractor,  the  points 
on  almost  any  sufficiently  long  orbit  segment  on  the  attractor  are  useful  trial  values  for 
grid-based  ML  state  estimation,  since  the  orbit  corresponding  to  these  points  is  known  and 
a  subset  of  them  pass  arbitrarily  close  to  xnQ . 

Combining  these  related  ideas  leads  to  the  following  grid-based,  approximate,  ML  state 
estimator  for  estimating  the  state  at  each  time  no  €  [0,  N]: 

Approximate  ML  State  Estimator 

1.  Generate  a  reference  orbit  with  arbitrary  initial  condition  (with  the  possible  exception 
of  a  set  of  measure  zero)  in  the  basin  of  attraction  of  the  attractor.  Let  {o^}  denote 
the  set  of  orbit  points. 

2.  Evaluate  the  likelihood  function  p{Y (no  —  m,  no  +  r);  ot)  for  the  given  observation  set 
and  each  reference  orbit  point  o,-. 

3.  Choose  as  the  ML  state  estimate  at  time  no,  the  reference  orbit  point  which  maximizes 
the  likelihood  function. 

The  algorithm  requires  specification  of  the  reference  orbit  length  as  well  as  the  constants 
m  and  r,  which  determine  the  number  of  observations  at  past  and  future  times  to  use  in  the 
likelihood  function.  Although  discovering  an  optimal  technique  for  selecting  these  values 
remains  an  elusive  goal,  there  are  several  practical  rules-of-thumb  that  are  appropriate.  In 
particular,  for  a  given,  desired  performance  level  (e.g.,  mean-squared  error),  the  orbit  length 
must  be  such  that  at  least  one  orbit  point  passes  sufficiently  close  to  the  actual  state  xno 
for  each  no  €  [0,  N]  so  that  the  desired  performance  level  is  achievable.  Also,  the  constants 
m  and  r  define  a  window  of  observation  times  and  should  be  chosen  so  that  orbit  segments 
passing  through  two  neighboring  points  remain  close,  on  a  point  by  point  basis,  over  the 
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observation  window.  In  other  words,  because  the  system  dynamics  are  deterministic  and 
invertible  by  assumption,  the  orbits  passing  through  two  neighboring  points  on  the  attractor 
remain  close  for  at  least  a  few  time  steps  into  the  future  and  a  few  time  steps  into  the  past 
as  well.  A  fundamental  aspect  of  the  algorithm  is  that  its  use  does  not  actually  require 
knowledge  of  the  system  dynamics,  but  only  the  availability  of  a  reference  orbit  on  the 
attractor. 

As  implicitly  suggested  by  the  expression  for  log  p(Y(M,N);x(n0))  given  by  (4.1),  in 
the  special  case  that  h  is  the  identity  operator,  the  estimator  is  simply  an  orbit  matcher 
which  chooses  as  the  state  estimate  at  time  no,  the  reference  orbit  point  for  which  the 
corresponding  orbit  segment  as  determined  by  m  and  r  most  closely  matches  in  a  weighted 
least  squares  sense  the  set  of  observations  Y (no  —  m,  no  4-  r),  with  the  weights  determined 
by  R~l .  Equivalently,  noise-free  reference  orbit  segments  are  matched  to  noise-corrupted 
orbit  segments.  When  h  is  not  the  identity  operator,  the  estimator  is  a  transformed  orbit 
matcher  in  which  noise-free  orbit  segments,  with  each  point  transformed  by  h,  are  matched 
to  noise-corrupted  orbit  segments,  also  with  each  point  transformed  by  h,  prior  to  the  noise 
corruption. 

This  approximate  ML  state  estimator  was  tested  on  the  Henon  and  Ikeda  maps  with 
the  identity  operator  used  for  h  in  (3.6)  and  a  diagonal  matrix  used  for  the  noise  covariance 
matrix  R.  Figures  4-7  (a)  and  (b)  depict  the  SNR  gain  as  function  of  input  SNR,  with 
the  curves  parameterized  by  the  number  of  points  in  the  reference  orbit  segment.  Each 
of  the  plotted  points  represents  the  average  improvement  in  SNR  for  the  two  components 
of  the  state  vector,  obtained  by  estimating  2000  consecutive  orbit  points.  As  indicated 


Figure  4-7:  Performance  results  for  approximate  ML  estimator  with  differently  sized  refer¬ 
ence  orbits  and  (m,  r)  =  (4,4)  .  (a)  Henon  map;  (b)  Ikeda  map. 
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by  the  figures,  for  input  SNEs  of  10  dB  and  above  performance  improves  as  the  size  of 
the  reference  orbit  increases,  whereas  for  input  SNRs  smaller  than  10  dB  the  size  of  the 
reference  orbit  has  little  influence  on  performance  (for  reference  orbit  sizes  in  excess  of  500 
points).  In  addition,  as  is  typical  of  ML-type  estimators  for  nonlinear  estimation  problems, 
the  estimator  is  most  effective  with  larger  input  SNRs.  The  decrease  in  performance  as  the 
input  SNR  increases  from  15  to  25  dB  is  attributable  to  the  reference  orbit  size.  Figures  4-8 
(a)  and  (b)  depict  similar  results,  but  with  the  curves  now  parameterized  by  the  number 
of  past  and  future  observations  ( m,  r )  used  in  the  likelihood  function.  The  figures  suggest 


Figure  4-8:  Performance  results  for  approximate  ML  estimator  with  different  numbers  of 
observations  (m,r)  used  in  the  likelihood  function  and  a  4000-point  reference  orbit,  (a) 
Henon  map;  (b)  Ikeda  map. 

that  performance  improves  initially  as  the  values  of  m  and  r  increase  from  zero,  but  then 
deteriorates  as  m  and  r  exceed  certain  values.  The  figures  also  suggest  that  for  both  maps, 
no  single  parameter  pair  (ra,r)  achieves  the  best  performance  with  all  input  SNRs. 

4.2.3  Extensions 

A  straightforward  extension  to  the  approximate  ML  estimator  yields  performance  results 
comparable  to  those  of  the  original  algorithm,  but  more  importantly  renders  the  algorithm 
potentially  useful  even  when  no  reference  orbit  is  available.  When  a  reference  orbit  is 
available,  the  extension  consists  of  using  the  average,  possibly  weighted,  of  the  P  reference 
orbit  points  for  which  the  likelihood  function  is  largest  as  the  state  estimate  at  a  given  time. 
(An  alternative  extension  not  pursued  here  would  be  to  use  the  average  of  all  reference 
orbit  points  for  which  the  likelihood  function  exceeds  a  suitably  chosen  threshold  as  the 
state  estimate).  The  original  algorithm  corresponds  to  the  special  case  of  the  extension  for 
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which  P  equals  1.  As  shown  later  in  this  section,  when  no  reference  orbit  is  available,  the 
output  transformation  h  is  the  identity  operator,  and  the  number  of  observations  N  +  1  is 
sufficiently  large,  one  obtains  a  nonnegligible  improvement  in  SNR  with  this  extension  to 
the  approximate  ML  estimator. 

Figures  4-9  and  4-10  depict  the  performance  results  obtained  with  this  averaging  esti¬ 
mator  for  P  =  5  and  simple  averaging  used  for  the  state  estimate  at  each  time  n0  €  [0,iV]. 
The  parameterization  of  the  curves  in  Figures  4-9  and  4-10  is  the  same  as  in  Figures  4-7 
and  4-8,  respectively.  A  comparison  of  the  plotted  results  with  those  shown  in  Figures  4-7 


Figure  4-9:  Performance  results  for  averaging  estimator  with  differently  sized  reference 
orbits,  (m,  r)  =  (4,4),  and  averaging  over  5  reference  orbit  points,  (a)  Henon  map;  (b) 
Ikeda  map. 


Figure  4-10:  Performance  results  for  averaging  estimator  with  different  numbers  of  obser¬ 
vations  (m,  r)  used  in  the  likelihood  function,  a  4000-point  reference  orbit,  and  averaging 
over  5  reference  orbit  points,  (a)  Henon  map;  (b)  Ikeda  map. 

and  4-8  indicate  that  averaging  over  5  reference  orbit  points  offers  no  additional  SNR  gain 
over  the  use  of  a  single  reference  orbit  point  as  the  state  estimate.  Computer  experiments 
involving  averaging  over  other  numbers  of  reference  orbit  points  also  yielded  no  additional 
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SNR  gain. 

However,  averaging  has  a  useful  role  in  the  so-called  “self- cleaning”  problem,  the  problem 
in  which  no  reference  orbit  is  available,  the  system  dynamics  are  unknown,  and  the  output 
transformation  h  is  the  identity  operator,  so  that  each  observation  y(no)  is  simply  a  noise- 
corrupted  version  of  ®(no),  the  state  at  that  time.  We  assume  with  some  loss  of  generality 
that  the  observation  noise  covariance  matrix  R  is  known  as  well.  By  using  the  observation 
set  as  a  reference  orbit,  we  can  apply  the  averaging  estimator  to  this  problem.  In  other 
words,  we  let  the  observation  set  Y(0,N )  have  two  roles — one,  the  original  role  in  the 
likelihood  function  p(Y  (no  —  m,  no  +  r);  o,)  as  the  fixed  observation  subset  Y  (no  -ro,  no  +  r), 
and  the  other  as  the  set  of  reference  orbit  points  {o,}  over  which  the  likelihood  function  is 
maximized,  where  log p(Y (no  —  m,  no  +  r);oi )  is  now  formally  defined  as 

logp(Y (n0  -  m,  n0  +  r);  o,)  = 

C(m ,  r)  -  i  £  [jj(no  +  j)  -  oi+jf  R71  fo(n0  +  j)  -  oi+j]  (4.10) 

j=—m 

j  r 

=  C(m,r)-~  53  [yi^o  +  j)  -  y(i  +  j)f  R"1  [y(n0  +  j)  -  y(i  +  j)]  (4.11) 

jsz—m 

where  C(m ,  r)  is  a  normalizing  constant.  Strictly  speaking,  p(Y (no  —  m,  no  4-  r);  o,)  is  not  a 
likelihood  function  for  the  underlying  system  model  when  Oi  =  y(i)  because  Y (0,  N)  is  not 
an  orbit  segment.  As  such,  referring  to  this  estimation  approach  as  ML  state  estimation  is 
inappropriate. 

It  follows  from  (4.11)  that  logp(F(no  —  m,  no  +  r);t/(no))  =  0  so  that  the  likelihood 
function  is  maximized  at  time  no  by  y(no).  Thus,  choosing  the  observation  which  maximizes 
the  likelihood  function  at  each  time  no  as  the  state  estimate  would  yield  no  improvement 
in  SNR. 

Figures  4-11  (a)  and  (b)  depict  the  performance  results  obtained  by  applying  this  self¬ 
cleaning,  averaging  estimator  to  the  Henon  and  Ikeda  maps.  Each  plotted  point  is  the 
average  improvement  in  SNR  for  a  2000  point  observation  set  and  (m,  r)  =  (4,4).  In 
contrast  to  the  curves  in  Figures  4-9  and  4-10,  the  curves  in  Figures  4-11  (a)  and  (b)  are 
parameterized  by  the  number  of  neighboring  observations,  as  determined  by  orbit  matching, 
which  were  used  for  averaging  in  order  to  estimate  the  state  at  each  time  no.  The  actual 
observation  at  each  time  was  not  included  in  the  average  for  that  time.  As  suggested  by  the 
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figures,  averaging  over  neighboring  points  has  a  beneficial  role  in  this  situation  and  there 
is  a  nonnegligible  SNR  improvement  for  each  input  SNR  for  both  maps.  The  performance 


Figure  4-11:  Performance  results  for  self-cleaning,  averaging  estimator  with  averaging  over 
neighboring  observations  determined  by  orbit  matching,  a  2000-point  observation  set,  and 
(m,  r)  =  (4,4).  (a)  Henon  map;  (b)  Ikeda  map. 

improvement  even  at  small  input  SNRs  is  surprising  in  light  of  the  fact  that  no  knowledge 
of  the  underlying  system  dynamics  is  used1. 

We  conclude  by  mentioning  one  possible  extension  of  this  approach,  the  use  of  Wiener- 
type  filtering.  For  example,  given  a  set  of  suitably  chosen  neighboring  orbit  points,  a 
Wiener-type  state  estimator  for  x(n0),  the  estimate  of  the  state  at  time  no,  is  given  by 

x(n0)  =  avg(no)  +  Aavg(n0 )  [Aavg(n0)  +  .R]-1  ( y(n0 )  -  avg(n0 ))  (4.12) 


where  avg(no)  denotes  the  averaged  orbit  points  for  time  no,  Aavg(no)  denotes  the  (esti-mated)  covarian 

o  and  R  denotes 


the  covariance  matrix  of  the  observation  noise. 


4.3  MMSE  State  Estimation:  Local  Techniques 

The  previous  sections  of  this  chapter  dealt  with  nonrandom  state  estimation,  that  is,  esti¬ 
mation  of  an  unknown,  but  nonrandom  state  vector  or  orbit  segment  and  the  emphasis  was 
on  ML  state  estimation.  In  contrast,  this  section  and  the  next  deal  with  state  estimation 
in  a  Bayesian  context,  with  the  unknown  state  vector  at  a  given  time  treated  as  a  random 

1  Since  reporting  this  self-cleaning,  averaging  estimator  [76,  77],  we  discovered  that  similar  work  was 
independently  pursued  and  reported  in  [40] 
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vector,  and  the  emphasis  is  on  Minimum- Mean- Squared- Error  (MMSE)  state  estimation. 
In  this  section  we  introduce  a  local,  heuristic  MMSE  state  estimator,  and  in  the  next,  a 
global,  approximate  MMSE  state  estimator.  The  extended  Kalman  filter  (EKF)  provides 
the  foundation  for  the  local  estimator  discussed  in  this  section,  and  variations  of  the  es¬ 
timator  are  provided  for  all  three  problem  scenarios — known  system  dynamics,  unknown 
system  dynamics  with  availability  of  a  clean  reference  orbit,  unknown  system  dynamics 
with  no  reference  orbit.  The  global  estimator  introduced  in  the  next  section  is  applicable 
only  to  the  first  two  of  these  scenarios.  This  latter  estimator  exploits  the  ergodicity  of  dis¬ 
sipative,  chaotic  systems  allowing  the  computationally  intractable  integral  for  the  MMSE 
state  estimator  given  by  (3.19)  to  be  replaced  by  an  infinite  summation  which  is  easier  to 
approximate. 

4.3.1  The  Extended  Kalman  Filter  and  Smoother 

The  extended  Kalman  filter  (EKF)  is  a  recursive  state  estimator  that  can  be  used  with  a 
large  class  of  nonlinear  state-space  models.  Unlike  the  Kalman  filter  which  is  the  optimal 
MMSE  estimator  for  a  restricted  set  of  linear  state-space  models,  the  EKF  is  a  heuristic 
algorithm  which  in  general  is  not  the  MMSE  state  estimator  for  a  given,  nonlinear,  state- 
estimation  problem.  Because  of  this  lack  of  optimality,  one  can  not  determine  a  priori  the 
performance  of  the  extended  Kalman  filter  for  a  specific  problem  as  is  possible  with  the 
Kalman  filter. 

In  this  subsection,  we  derive  the  equations  for  the  extended  Kalman  filter  (EKF)  for  the 
general  DTS/DTO  scenario  given  by  (3.1)-(3.2)  and  repeated  here  for  reference: 

x(n  +  l)  =  fn(x(n))  +  gn(x(n))w(n)  (4.13) 

y(n)  =  hn(x(n))  +  v(n).  (4.14) 

An  underlying  assumption  in  the  derivation  is  that  the  functions  fn,  gn,  and  hn  in  (3.1) 
and  (3.2)  are  sufficiently  smooth  so  that  they  have  Taylor  series  expansions. 

To  derive  the  EKF  equations  for  obtaining  the  state  estimate  at  time  n  +  1,  one  first 
expands  the  functions  fn  and  gn  in  Taylor  series  about  the  current  state  estimate  &(n|n), 
and  the  function  hn  in  a  Taylor  series  about  x(n\n  —  1),  where  as  for  the  Kalman  filter 
x{n\n—  1)  and  x{n\n)  denote  the  estimates  of  ic(7i)  based  on  the  observation  sets  Y (0,  n  —  1) 
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and  Y(0,  n),  respectively.  The  state  estimate  used  in  the  expansion  for  each  function  is  the 
most  recent  estimate  available  when  the  function  requires  evaluation.  The  Taylor  series 
expansions  are  given  by 


where 


/n(*(ra))  =  fn(£(n\n))  +  Fn(x(n)~  ®(n|n)  +  •••  (4.15) 

9n(*(n))  =  (?„  +  •••  (4.16) 

hn(x(n ))  =  hn(x(n\n-l))  + Hn(x(n)-x(n\n-  1))  +  ---,  (4.17) 

Fn  =  D{fn(x(n\n))}  (4.18) 

Gn  =  Sn(*Hn))  (4.19) 

Hn  =  D{hn{x{n\n  —  1))}.  (4.20) 


Retaining  only  those  terms  explicitly  shown  in  the  above  expansions  yields  the  following 
approximations  to  the  DTS/DTO  state  equation  (3.1)  and  observation  equation  (3.2): 


x(n  +  1)  =  fn(x(n\n))  +  Fn(x(n)  -  x(n\n))  + Gnw(n)  (4.21) 

=  Fnx(n )  +  Gnw{n)  +  [/n(&(n|n))  -  Fnx{n\n)\  (4.22) 

y(n)  =  hn(x(n\n  -  1))  +  Hn(x(n)  -  x(n\n  -  1))  +  v(n)  (4.23) 

=  Hnx(n)  +  x?(n)  +  [hn(x(n\n  -  1))  -  Hnx(n\n  -  1)] .  (4.24) 


In  (4.22)  and  (4.24),  Fn  and  Hn  are  matrices,  and  one  can  evaluate  the  bracketed 
expressions  since  the  values  of  the  quantities  in  these  expressions  are  known  at  the  time  they 
are  needed.  As  a  result,  these  equations  are  identical  to  the  state  and  observation  equations 
used  by  the  Kalman  filter,  with  the  addition  of  deterministic  input  terms.  However,  the 
Kalman  filter  equations  can  easily  be  modified  to  account  for  deterministic  inputs  in  the 
state  and  observation  equations.  Thus,  the  Kalman  filter  can  be  applied  to  the  system 
model  given  by  the  above  equations,  and  is  in  fact  the  MMSE  estimator  for  this  model. 
The  resulting  filtering  equations,  provided  in  Table  4.1,  constitute  the  extended  Kalman 
filter  for  the  DTS/DTO  model. 
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Prediction  Step 

x(n  +  l|n)  =  fn(x(n\n)) 

P(n  +  l|n)  =  FnP(n\n)Fl  +  GnQ(n)Gl 


(4.25) 

(4.26) 


x(n  +  l|n  +  1) 
K(n+  1) 
P(n+  l|n+  1) 


Measurement  Update  Step 

x(n  +  l|n)  +  K(n  +  1)  [y(n  +  1)  -  hn+i(x(n  +  l|n))] 
P(n  +  l|n)Hn+i  Hn+1P(n+  l\n)H„+1  +R(n+  1)] 
[Itf-K(n  +  l)Hn+1]P(n  +  l\n) 


-l 


(4.27) 

(4.28) 

(4.29) 


Initialization 

i(0|  -  1)  =  m0 
P( 0|  -  1)  =  P0. 


(4.30) 

(4.31) 


Table  4.1:  The  Extended  Kalman  Filter  (EKF)  for  the  DTS/DTO  Scenario 


A  comparison  of  the  equations  provided  in  Tables  3.1  and  4.1  reveals  the  similarity 
between  the  Kalman  filter  and  extended  Kalman  filter.  However,  a  fundamental  differ¬ 
ence  between  the  two  is  that  the  Kalman  filter  is  the  MMSE  state  estimator  for  a  linear 
state-space  model,  whereas  the  extended  Kalman  filter  is  the  MMSE  state  estimator  for  a 
linearized  state-space  model  and  a  heuristic  state  estimator  for  the  nonlinear  state-space 
model  which  gives  rise  to  the  linearized  model.  This  analogy  suggests  that  the  effectiveness 
of  the  EKF  depends  largely  on  the  accuracy  of  the  linear  approximations  of  the  nonlinear 
functions  fn,gn,  and  hn.  If  the  neglected  higher-order  terms  in  the  Taylor  series  expansions 
of  these  functions  are  not  negligible,  the  EKF  may  perform  poorly. 

An  interesting  aspect  of  the  EKF  concerns  the  state  estimate  equation  in  the  predic¬ 
tion  step  (4.25).  Combining  this  equation  with  (4.21)  leads  to  the  following  sequence  of 
equations: 

®(n+l|n)  =  E(x(n+  1)|Y(0,  n))  (4.32) 

=  fn(i(n\n))  + Fn(E(x(n)\Y(0,n))- x(n\n))  + E(Gnw(n))  (4.33) 
=  /n(*Wn)),  (4.34) 
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since  E(x(n)\Y(0,  n))  =  x(n|n)  and  E(Gnw(n ))  =  0.  This  implies 


E(x(n  +  1)|K(0,  n))  =  £(/(*(n))|y(0,  n))  =  f(E(x(n)\Y(0,  n))),  (4.35) 

which  although  true  for  linear  systems  is  not  true  in  general  for  nonlinear  systems. 

Just  as  one  can  augment  the  Kalman  filter  equations  to  obtain  a  fixed-lag,  linear 
smoother,  one  can  augment  the  extended  Kalman  filter  equations  to  obtain  a  nonlinear 
smoother.  In  fact,  the  same  state  and  covariance  update  equations  for  the  fixed-lag,  linear 
smoother  given  in  Table  3.2  can  be  used  for  fixed-lag,  nonlinear  smoothing,  with  the  appro¬ 
priate  substitution  of  the  parameter  matrices  Fn,  Gn,  and  Hn  calculated  for  the  EKF.  In 
the  following  sections,  we  refer  to  the  resulting  nonlinear  smoother  as  the  extended  Kalman 
smoother  (EKS). 

4.3.2  Performance  Results  with  Known  System  Dynamics 

In  contrast  to  the  Kalman  filter  for  which  one  can  evaluate  the  error  covariance  a  priori , 
the  EKF  and  EKS  are  not  optimal  and  one  can  only  evaluate  their  performance  on  a 
specific  problem  with  Monte  Carlo  simulations.  In  this  section,  we  present  and  interpret 
experimental  performance  results  obtained  with  the  EKF  and  EKS  on  the  Henon  and  Ikeda 
maps  when  the  system  dynamics,  i.e.,  the  function  /,  are  known.  All  performance  results 
are  for  the  restricted  DTS/DTO  model  given  by  (3.5)  and  (3.6)  with  the  added  restrictions 
that  h  is  the  identity  operator  and  R,  the  covariance  matrix  of  the  observation  noise,  is 
diagonal.  The  results  indicate  the  extreme  sensitivity  of  the  EKF  and  EKS  to  the  driving- 
noise  covariance  matrix  Qn  in  the  filtering  and  smoothing  equations. 

When  using  the  EKF  and  EKS,  one  implicitly  assumes  that  the  initial  state  ®(0)  is  a 
Gaussian  random  vector  with  mean  vector  mo  and  covariance  matrix  Pq,  since  as  indicated 
in  Table  4.1,  these  values  are  used  to  initialize  the  filter.  However,  this  is  often  an  inappro¬ 
priate  assumption  with  dissipative,  chaotic  systems,  with  a  more  appropriate  assumption 
often  being  that  the  distribution  of  the  initial  state  is  that  of  the  physical  measure  on 
the  attractor,  for  which  there  generally  is  not  a  corresponding  PDF.  In  light  of  this  and 
to  facilitate  a  performance  comparison  with  the  global  estimator  introduced  later  in  this 
chapter,  we  obtained  all  performance  results  by  using  the  the  initial  data  point  t/(0)  as  the 
initial  updated  estimate  i(0|0)  and  the  noise  covariance  matrix  R  as  the  corresponding 


66 


error  covariance  matrix  P(0|0).  This  initialization  strategy  is  equivalent  to  assuming  that 
P(0|  - 1)  is  a  diagonal  matrix  with  infinite  values  on  the  diagonal  corresponding  to  complete 
uncertainty  in  the  initial  state.  Note  that  using  the  initial  observation  to  initialize  the  filter 
is  only  possible  when  the  transformation  h  is  invertible.  Although  this  is  a  suboptimal  way 
to  initialize  the  filter,  the  performance  results  obtained  with  this  initialization  method  were 
found  to  be  comparable  to  those  obtained  with  more  standard  initialization  methods. 

Figures  4-12  (a)  and  (b)  show  the  first  set  of  performance  results  obtained  with  the  EKS 
and  known  system  dynamics.  The  performance  curves  axe  parameterized  by  the  number 
of  lags  used  by  the  EKS,  so  that  for  example  a  lag  of  i  indicates  the  results  are  those  for 
the  lagged  estimator  x(n  —  i\n).  By  definition,  a  lag  of  zero  corresponds  to  the  standard 
EKF.  Each  of  the  plotted  points  represents  the  average  improvement  in  SNR  for  the  two 
components  of  the  state  vector,  obtained  by  estimating  a  2000-point  orbit  segment.  Because 


Figure  4-12:  Performance  results  for  EKS  for  different  numbers  of  lags,  known  system 
dynamics,  and  Q  =  [0].  (a)  Henon  map;  (b)  Ikeda  map. 

the  state  equation  (3.5)  is  deterministic  and  thus  there  is  no  driving  noise,  the  driving-noise 
covariance  matrix  Qn  was  set  equal  to  the  zero  matrix  for  all  times  n  in  the  EKF  and  EKS 
equations,  so  that  Qn  =  Q  =  [0],  and  the  matrix  Gn  was  set  equal  to  the  identity  matrix 
Itf.  The  results  for  the  Ikeda  map  are  poor  with  the  negative  values  indicating  that  the 
SNR  has  actually  gotten  worse,  and  the  results  for  the  Henon  map  are  mediocre  except  at 
larger  input  SNRs. 

Figures  4-13  and  4-14  show  the  performance  results  obtained  with  nonzero,  diagonal 
matrices  for  Qn  =  Q  in  the  EKF  and  EKS  equations,  with  the  constant  diagonal  value 
10-5  used  for  the  results  in  Figures  4-13  (a)  and  (b)  and  the  value  10~3  used  for  the  results 
in  Figures  4-14  (a)  and  (b).  Note  that  although  a  nonzero,  covariance  matrix  Q  was  used 
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in  the  EKF  and  EKS  equations,  the  same  deterministic  state  equation  (3.5)  was  used  to 
generate  the  data.  In  other  words,  the  nonzero  matrix  Q  in  the  EKF  and  EKS  equations 
corresponds  to  a  fictitious  driving  noise,  or  equivalently  a  conditioning  noise  used  only  in 
the  estimator.  Overall,  the  performance  results  are  better  than  those  shown  in  Figure  4-12; 


Figure  4-13:  Performance  results  for  EKS  with  known  system  dynamics,  and  Q  =  10  5  Itf. 
(a)  Henon  map;  (b)  Ikeda  map. 


Figure  4-14:  Performance  results  for  EKS  with  known  system  dynamics,  and  Q  =  10  3  Ij\f. 
(a)  Henon  map;  (b)  Ikeda  map. 

but  especially  for  the  Henon  map,  concomitant  with  the  performance  improvement  at  some 
input  SNRs  is  a  degradation  of  performance  at  other  input  SNRs.  Also,  as  one  might  expect, 
the  performance  for  lagged  estimators  is  considerably  better  than  that  for  an  estimator  with 
no  lags,  with  the  improvement  apparently  saturating  at  4  lags. 

A  comparison  of  the  performance  results  in  Figures  4-12,  4-13,  and  4-14  indicates  that 
the  EKS  is  extremely  sensitive  to  the  driving-noise  covariance  matrix  Q  used  in  the  smooth¬ 
ing  equations.  One  method  to  circumvent  this  sensitivity  is  with  careful  selection  of  a 
unique,  driving-noise  covariance  matrix  Q  at  each  SNR.  In  practical  applications,  such 
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tweaking  would  be  undesirable  and  probably  unacceptable,  since  the  SNR  may  be  unknown 
or  time-varying.  As  such,  the  performance  results  suggest  that  straightforward  application 
of  extended  Kalman  filtering  and  smoothing  techniques  to  chaotic  maps  has  little  practi¬ 
cal  value.  However,  as  we  show  in  the  next  section,  the  combination  of  extended  Kalman 
smoothing  and  orbit  matching  yields  a  potentially  effective  state  estimator  that  can  be  used 
even  if  the  system  dynamics  are  not  known. 

4.3.3  Performance  Results  with  Unknown  System  Dynamics 

One  can  enhance  the  performance  of  the  EKS  using  less  than  full  knowledge  of  the  system 
dynamics.  As  we  show  in  this  section,  incorporation  of  orbit  matching  in  the  EKS  yields 
an  estimator  which  does  not  require  full  knowledge  of  the  system  dynamics,  is  much  less 
sensitive  to  the  covariance  matrix  Q,  and  performs  better  overall  than  the  EKS  alone. 

As  discussed  in  Section  4.3.1,  use  of  the  EKS  requires  that  the  system  dynamics,  i.e.,  the 
function  f  in  the  state  equation,  be  linearized  about  the  estimated  state  ®(n|n)  at  each  time 
n.  In  particular,  for  the  restricted  DTS/DTO  model  of  interest  here  the  filtering  equations 
use  the  derivative  matrix  Fn  =  D{/(x(n|n))}.  When  the  function  /  is  not  known,  but  a 
sufficiently  long  reference  orbit  is  available,  one  can  exploit  the  topological  transitivity  of 
chaotic  systems,  as  was  done  earlier  in  the  context  of  ML  estimation,  to  obtain  an  estimate 
of  the  linearized  system  dynamics  at  ®(n|n)  needed  for  extended  Kalman  filtering  and 
smoothing.  In  other  words,  using  the  reference  orbit  one  can  obtain  an  affine  parameter  set 
{An,6n},  where  A„  is  an  N  x  A/"-matrix  and  bn  is  an  Af-element  column  vector,  which  for 
points  z  in  a  small  neighborhood  of  x(n\n)  satisfy  the  following: 

f(z)&Anz  +  bn.  (4.36) 

The  matrix  An  is  an  estimate  of  Fn,  whereas  the  vector  bn  is  an  offset  vector. 

One  simple,  but  effective  method  to  estimate  the  affine  parameter  set  {A„,6n}  is  the 
following.  First,  find  the  nearest  neighbors  (as  determined  by  an  appropriate  metric)  to 
x(n\n)  in  the  reference  orbit  and  the  immediate  successors  to  these  neighbors,  that  is,  the 
points  in  the  reference  orbit  which  immediately  follow  these  neighbors.  Let  denote 

the  N(n)  orbits  points  selected  as  the  nearest  neighbors  to  £(n|n)  and  let  denote 

the  successors  to  these  points,  so  that  p,-  =  /(o;).  Next,  apply  least-squares  line  fitting  to 
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these  neighbors  and  their  successors  to  estimate  the  affine  parameter  set.  Specifically,  find 
the  matrix  An  and  bn  which  minimize  the  total  prediction  error  £  given  by 

N(n) 

E  ||pi-(Ao,  +  6)||2  (4.37) 

t=l 

An  underlying  assumption  of  this  estimation  technique  is  that  there  is  little  change  in  the 
linearized  system  dynamics  among  points  in  small  neighborhoods  of  x(n\n).  This  assump¬ 
tion  is  reasonable  in  light  of  the  underlying  restriction  in  this  section  that  f  be  differentiable 
at  all  points  on  the  chaotic  attractor.  This  method  of  using  a  reference  orbit  to  obtain  lo¬ 
cally  linearized  estimates  of  the  dynamics  of  a  chaotic  system  was  apparently  first  proposed 
in  [21],  developed  further  in  [22]  primarily  in  the  context  of  prediction,  and  has  been  used 
extensively  since  for  various  applications  (e.g.  [23,  45]). 

Having  obtained  the  affine  parameter  set  {An,bn},  one  can  perform  extended  Kalman 
filtering  or  smoothing  at  time  n  by  substituting  An  for  Fn  in  the  filtering  and  smoothing 
equations  and  by  redefining  the  equation  for  the  predicted  state  estimate  x(n  +  l|n)  given 
by  (4.25)  as 

x(n  +  l|n)  =  An  x{n\n)  -f  bn,  (4.38) 

In  summary,  we  have  the  following  algorithm  for  extended  Kalman  smoothing  when  the 
system  dynamics  are  unknown,  but  a  noise-free  reference  orbit  is  available. 

EKS  with  Noise-Free  Reference  Orbit 

1.  Given  the  present  state  estimate  x(n\n),  find  the  nearest  neighbors  to  x(n\n)  in  the 
reference  orbit.  Let  N{n )  denote  the  number  of  selected  neighbors. 

2.  Using  the  nearest  neighbors  and  their  immediate  successors  in  the  reference  orbit, 
determine  the  affine  parameter  set  ( An,  bn)  (where  An  is  an  Af  X  A^-matrix  and  bn  is 
an  AAelement  column  vector)  which  minimizes  the  one-step,  total  squared  prediction 
error  among  the  N(n)  chosen  neighbors  and  their  successors. 

3.  Use  the  affine  mapping  determined  by  ( An,  bn)  as  an  estimate  of  the  system  dynamics 
at  ®(n|n)  in  the  extended  Kalman  filtering  and  smoothing  equations.  In  particular, 
substitute  An  for  Fn  in  these  equations,  and  substitute  (4.38)  for  (4.25)  to  obtain  the 
predicted  estimate  x(n  +  l|n). 
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4.  Repeat  the  procedure  until  all.  observations  have  been  processed 


There  are  several  parameters  associated  with  the  above  algorithm  such  as  the  number  of 
nearest  neighbors  used  at  each  time  n  and  the  size  of  the  reference  orbit.  In  addition,  fun¬ 
damental  elements  of  the  above  algorithm  have  been  left  unspecified,  including  the  criterion 
or  metric  used  for  selecting  nearest  neighbors.  A  logical  choice  of  metric  is  the  Euclidean 
distance  between  each  point  in  the  reference  orbit  and  the  current  state  estimate.  Figures 
4-15  and  4-16  show  the  performance  results  obtained  by  using  the  Euclidean  metric  for 
selecting  nearest  neighbors.  The  results  in  each  figure  were  obtained  by  using  a  4000-point 


Figure  4-15:  Performance  results  for  EKS  with  4000-point  reference  orbit,  15  nearest  neigh¬ 
bors,  and  Q  =  10-s  Ijv-  (a)  Henon  map;  (b)  Ikeda  map. 


INPUT  SNR  (dB) 


Figure  4-16:  Performance  results  for  EKS  with  4000-point  reference  orbit,  15  nearest  neigh¬ 
bors,  and  Q  =  10-3  I/y.  (a)  Henon  map;  (b)  Ikeda  map. 


reference  orbit  and  15  nearest  neighbors  for  estimating  the  affine  parameters  at  each  time 
n.  In  addition,  nonzero  covariance  matrices  Q  were  used  in  the  filtering  and  smoothing 
equations,  with  the  same  matrices  used  for  the  results  in  Figures  4-15  and  4-16  as  were 
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used  for  the  results  in  Figures  4-13  and  Figures  4-14,  respectively.  Overall,  the  results  are 
disappointing  (except  for  the  Henon  map  with  larger  input  SNRs)  and  comparable  to  those 
shown  in  Figures  4-13  and  4-14  suggesting  that  the  aspects  of  the  system  dynamics  used  by 
the  estimator  are  available  in  the  reference  orbit. 

As  is  the  case  when  the  system  dynamics  are  known,  one  can  improve  the  performance 
of  the  EKS  when  the  system  dynamics  are  unknown  by  carefully  selecting,  by  trial  and 
error,  a  unique  matrix  Q  for  each  input  SNR.  However,  one  can  avoid  such  undesirable 
parameter  tweaking  with  the  use  of  an  orbit-matching  criterion  to  select  nearest  neighbors 
used  for  estimating  each  affine  parameter  set.  In  particular,  one  matches,  in  a  weighted  least 
squares  sense,  the  current  state  estimate  x{n\n),  the  previous  m  state  estimates  x(n-i\n-i), 
i  =  1,  •  •  -  ,m  and  the  next  r  observations  y(n  +  i),  i  =  1,  •  •  -  ,r  to  orbit  segments  in  the 
reference  orbit  and  chooses  those  reference  orbit  points  as  nearest  neighbors,  for  which  the 
corresponding  orbit  segments  most  closely  match.  Mathematically,  when  h  is  the  identity 
operator,  one  chooses  at  time  n  those  reference  orbit  points  o;  for  which  the  following  error 
criterion  is  smallest: 


m 

£  [(/-J(°0  -  *(»  -  i\n  -  j))Tw~1{f~3{oi)  -  *0  -  i\n  -  5)) 

3= 0 

+  J2  [( fk(oi )  -  y(n  +  k))T -  y(n  +  k))  . 
k= 1 


(4.39) 


If  h  is  not  the  identity  operator,  the  following  criterion  is  appropriate: 


m 

-  j\n  -  j))Tw~l{f~:‘{0i)  -  *(n  -  3 \n  -  j))] 

3=0 

+  £  [( h(fk(0i ))  -  y(n  +  k))TR-\h(fk(0i ))  -  y(n  +  *))]  .  (4.40) 

k—1 


In  the  above  summations,  the  term  denotes  the  jth  reference  orbit  point  before  ot, 

and  fk{0i)  denotes  the  kth  reference  orbit  point  after  o,-.  The  matrix  W  is  a  weighting 
matrix  which  one  must  select.  In  the  examples  that  follow,  all  of  which  use  the  identity 
operator  for  h,  W  was  set  equal  to  JR,  even  though  other  choices  for  W  may  have  been 
equally  or  more  appropriate. 

Figures  4-17  and  4-18  show  the  performance  results  obtained  with  two  different  choices 
of  the  parameter  pair  (m,r)  used  in  the  above  criterion  for  selecting  nearest  neighbors. 
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Additional  computer  experiments  have  suggested  that  there  is  no  optimal  selection  of  m 


INPUT  SNR  (dB) 


Figure  4-17:  Performance  results  for  EKS  with  4000-point  reference  orbit,  15  nearest  neigh¬ 
bors,  Q  =  10-5  I and  (m,  r)  =  (3,3).  (a)  Henon  map;  (b)  Ikeda  map. 


Figure  4-18:  Performance  results  for  EKS  with  4000-point  reference  orbit,  15  nearest  neigh¬ 
bors,  Q  =  10~5  I .v,  and  (m,  r)  =  (1,4).  (a)  Henon  map;  (b)  Ikeda  map. 


and  r  and  that  various  combinations  yield  comparable  results.  The  plotted  results  indicate 
that  use  of  this  alternative  criterion  for  selecting  nearest  neighbors  yields  an  effective  state 
estimator  (at  least  for  the  chaotic  maps  chosen),  superior  to  that  of  the  straightforward 
EKS  with  known  system  dynamics. 

As  one  would  expect,  the  size  of  the  reference  orbit  and  the  number  of  nearest  neighbors 
used  for  estimating  the  affine  parameters  strongly  influence  performance.  Figures  4-19  and 
4-20  depict  the  performance  results  obtained  for  differently  sized  reference  orbits  and  differ¬ 
ent  numbers  of  nearest  neighbors,  respectively.  Both  sets  of  figures  show  the  performance 
results  for  a  constant  estimator  lag  of  4  and  the  parameter  pair  (m,  r )  equal  to  (1,4).  The 
results  in  Figures  4-19  (a)  and  (b)  indicate  performance  improves  steadily  as  the  size  of  the 
reference  orbit  increases  from  500  to  4000  points.  The  results  in  Figures  4-20  (a)  and  (b) 
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Figure  4-19:  Performance  results  for  EKS  with  4  lags,  15  nearest  neighbors,  Q  =  10~5  Ij\f, 
and  (m,  r)  =  (1,4).  (a)  Henon  map;  (b)  Ikeda  map. 


Figure  4-20:  Performance  results  for  EKS  with  4  lags,  4000-point  reference  orbit,  Q  = 
IQ"5  Ev,  and  (m,  r)  =  (1,4).  (a)  Henon  map;  (b)  Ikeda  map. 
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indicate  that  there  is  no  simple  relation  between  neighborhood  size  and  estimator  perfor¬ 
mance,  with  an  apparent  degradation  in  performance  as  the  number  of  nearest  neighbors 
exceeds  some  system-dependent  threshold. 

Surprisingly,  this  state  estimation  approach  involving  affine  parameter  estimation  and 
extended  Kalman  smoothing  provides  nonnegligible  performance  gain  even  if  no  reference 
orbit  is  available,  the  transformation  h  is  the  identity  operator,  and  one  uses  the  observation 
set  as  the  reference  orbit  in  the  algorithm  outlined  above.  Figures  4-21  (a)  and  (b)  show  the 
performance  results  obtained  by  estimating  the  first  1000  points  of  a  2000-point  observation 
set  and  using  the  entire  observation  set  as  the  reference  orbit.  Surprisingly,  although  the 


Figure  4-21:  Performance  results  for  EKS  with  no  reference  orbit,  2000  observations,  Q  = 
10~4  I A",  and  (m,r)  =  (1,4).  (a)  Henon  map;  (b)  Ikeda  map. 

performance  is  not  as  good  as  that  obtained  with  a  noise-free  reference  orbit,  there  is  still 
a  nonnegligible  SNR  gain  for  each  input  SNR.  Equally  surprising  is  that  the  SNR  gain 
is  nearly  the  same  for  different  numbers  of  lags.  The  plotted  SNR  gain  for  each  input 
SNR  is  not  the  best  achievable;  larger  gains  are  obtainable  if  one  carefully  selects  the  noise 
covariance  matrix  Q  and  parameter  pair  (m,  r )  at  each  input  SNR  level. 

When  no  reference  orbit  is  available,  the  question  arises  as  to  the  additional  improvement 
in  performance,  if  any,  obtained  by  iterating  the  estimator  on  the  observation  set.  In 
other  words,  does  performance  continue  to  improve  if  one  iterates  the  estimator,  using  the 
state  estimates  at  each  iteration  as  the  observation  set  for  the  next  iteration?  Computer 
experiments  have  suggested  that  an  additional  2-4  dB  SNR  gain  occurs  with  one  additional 
iteration,  but  there  is  little  additional  gain  with  more  iterations. 

A  relevant  consideration  discussed  in  [27]  is  that  the  use  of  the  least-squares  method 
of  affine  parameter  estimation  for  linear  regression  problems  is  applicable  only  when  the 
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dependent  variables  and  not  the  independent  variables  in  the  regression  equations  are  noise- 
corrupted.  For  the  affine  parameter  estimation  problem  of  interest  here,  both  the  dependent 
and  independent  variables  come  from  the  reference  orbit.  With  a  noise-free  reference  orbit, 
both  sets  of  variables  are  noise-free;  but,  with  no  reference  orbit  and  use  of  the  observa¬ 
tion  set  as  the  reference  orbit,  both  sets  of  variables  are  noise-corrupted.  Consequently, 
the  least-squares  method  used  to  estimate  the  affine  parameters  for  the  EKS  when  the  ob¬ 
servation  set  is  also  used  as  the  reference  orbit  is  strictly  speaking  inappropriate,  with  a 
measurement- error-model  or  equivalently  total-least-squares  method  [27,  31,  32,  81]  theoret¬ 
ically  more  appropriate.  Such  an  approach  was  used  for  the  results  we  presented  in  [77]  in 
the  context  of  fixed-interval  smoothing.  However,  more  recent  computer  experiments  indi¬ 
cated  that  use  of  a  measurement-error-model  method  for  estimating  the  affine  parameters 
offered  little  if  any  overall  performance  improvement  over  use  of  the  simpler  least-squares 
method,  when  orbit  matching  was  used  for  selecting  nearest  neighbors.  That  is,  use  of  the 
measurement-error-model  method  provided  performance  improvement  at  some  input  SNRs 
and  performance  degradation  at  other  input  SNRs.  More  importantly,  the  experiments 
indicated  that  the  measurement-error-model  method  was  much  more  sensitive  than  the 
least-squares  method  to  filter  parameters  such  as  Q  and  the  observation  set  size.  Greater 
sensitivity  to  the  observation  set  size  is  not  surprising,  as  the  effectiveness  of  measurement- 
error-model  methods  for  parameter  estimation  often  depends  strongly  on  the  size  of  the 
data  set  used  for  estimating  the  parameters.  It  was  due  to  the  greater  sensitivity  of  the 
measurement-error-model  method  than  the  simpler  least-squares  method  and  comparable 
performance  results  achieved  with  both  methods  (with  no  parameter  fine  tuning)  that  the 
latter  method  was  used  for  the  above  examples. 


4.4  MMSB  State  Estimation:  Global  Techniques 

The  state  estimators  discussed  thus  far  all  require  that  a  search  be  performed  for  nearest 
neighbors.  In  this  section,  we  introduce  a  global,  approximate  MMSE  state  estimator  which 
avoids  this  computationally  intensive  requirement.  The  estimator  is  applicable  when  the 
initial  condition  ®(0)  is  a  random  vector  with  distribution  given  by  the  physical  measure 
on  the  attractor;  and  as  with  the  state  estimators  discussed  earlier,  the  estimator  uses  a 
noise-free  reference  orbit. 
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As  discussed  in  Chapter  3,  for  a  given  observation  set  Y ,  the  MMSE  state  estimator  for 
x(n )  is  the  conditional  mean  which  is  given  by  (3.16)  if  the  a  posteriori  density  p(ac(n)|F) 
exists  and  by  (3.19)  otherwise,  with  the  latter  equation  repeated  here  for  reference: 


£(*(»)in 


f  x(n)  p(Y  |a(n))  dpx^ 
fp(Y\x(n))dfix(n) 


(4.41) 


where  px(n)  denotes  the  measure  corresponding  to  the  distribution  of  the  state  x(n)  at  time 
n,  and  the  integration  over  x(n)  is  defined  in  the  Lebesgue  sense.  We  know  from  Chapter  2 
that  the  physical  measure  on  the  attractor  is  an  ergodic  measure,  so  that  if  the  initial  state 
is  distributed  according  to  this  measure,  the  state  at  all  future  times  will  be  distributed 
according  to  this  measure  as  well.  In  light  of  this  and  ergodicity,  if  the  distribution  of  the 
initial  state  *(0)  is  given  by  the  physical  measure,  then  the  following  holds  for  the  state 
x(n)  at  each  time  n  for  almost  all  points  z  on  the  chaotic  attractor: 


/  x{ri) p(y |a:(7i))  <fyig(n)  ^  ^  1  y  f\z)pn(Y\f\z )) 

fp(Y\x(n))dpx{ n)  N  “oo  N  ^  limM-oo  ^E^O1  Pn(Y \fj(z)Y 


(4.42) 


where  pn(Y\f3(z))  denotes  the  PDF  of  the  observation  set  conditioned  on  /J  (z)  being  the 
value  of  the  state  at  time  n.  Thus,  the  conditional  mean  is  simply  a  weighted  average  of 
points  on  a  chaotic  orbit,  where  the  weight  for  a  point  is  given  by  the  value  of  the  likelihood 
function  conditioned  on  that  point. 

The  above  summation  is  not  useful  for  practical  MMSE  state  estimation  for  two  reasons. 
First,  its  evaluation  requires  that  an  infinite  number  of  terms  be  calculated.  Second,  as 
noted  in  Section  4.2,  the  likelihood  function  rapidly  becomes  impulse-like  as  the  size  of 
the  observation  set  Y  increases.  Thus,  if  one  were  to  attempt  to  approximate  the  above 
expression  by  summing  over  a  finite  number  of  terms,  the  value  of  the  likelihood  function 
for  each  term  would  for  all  practical  purposes  be  zero  if  Y  contained  more  than  a  few 
observations. 

However,  we  can  use  a  practical  approximation  to  (4.42)  which  yields  a  potentially 
effective  state  estimator.  First,  we  approximate  the  infinite  sums  in  the  numerator  and 
denominator  with  finite  sums  involving  the  same  number  of  terms,  i.e.,  M  =  N.  Second, 
we  approximate  the  likelihood  function  pn(Y\f(z))  with  pn{Y{n  -  m,  n  +  r)\f(z)),  i.e.,  we 
oxdy  use  a  subset  of  observations  occurring  at  times  near  the  time  of  interest.  If  r  =  0  the 
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estimator  is  a  state  filter,  whereas  if  r  >  0  the  estimator  is  a  state  smoother.  Combining 
these  two  approximations  yields  the  following  global  state  estimator  x(n)  for  the  state  at 
time  n: 

j.(n)  =  Y'1  ■fWPnW’»-">,n+r)|.r(z)) 

3  Ef.01  fn(V(»  -m,n  +  r)\ f(z)) 

where  z  is  an  arbitrarily  selected  point  on  the  chaotic  attractor  (with  the  possible  exception 

of  a  set  of  points  of  measure  zero).  For  the  results  reported  here,  the  set  {/‘(z)}^1 

corresponds  to  a  reference  orbit  segment. 

One  property  of  the  estimator  that  arises  because  of  its  use  of  a  rectangular  window  of 
observations  is  that  the  observation  set  used  at  time  n  +  1  differs  from  the  observation  set 
used  at  time  n  by  2  points.  Furthermore,  the  following  relation  holds: 

pn+i(Y(n-  m,n  +  r)\f(z))  =  pn(Y(n  -  m,  n  +  r)|/,-1(z)).  (4.44) 

Therefore,  if  the  reference  orbit  is  allowed  to  grow  by  one  point  at  each  time  n,  then  one 
can  reduce  the  computational  burden  by  deriving  the  likelihoods  pn+j(y (n  —  m  +  l,n  +  r  + 
1)| f\z))  at  time  n  +  1  from  the  likelihoods  at  time  n.  Equivalently,  if  the  N  likelihoods 
{Pn(Y(n  —  m,n  +  r)|/t(z))}^.Q1  are  used  in  (4.43)  at  time  n,  then  one  can  achieve  a 
computational  savings  at  time  n  +  1  by  using  the  likelihoods  {pn+i(Y(n  -  m+l,n-fr  + 
!)!/(*))}, 'll- 

An  alternative  to  the  use  of  a  fixed-size,  rectangular  window  of  observations  at  each 
time  n  is  the  use  of  a  growing  window  of  observations  with  an  exponential  weighting  of  past 
observations,  as  is  often  done  in  recursive  filtering  applications. 

Figures  4-22  and  4-23  depict  the  performance  results  obtained  with  this  approach  on 
the  Henon  and  Ikeda  maps.  In  both  sets  of  figures,  the  curves  are  parameterized  by  the  pair 
(m,  r)  denoting  the  number  of  past  and  future  observations  used  in  the  likelihood  function 
at  each  time.  Also,  for  both  set  of  figures,  N  was  set  equal  to  4000  in  (4.43).  In  Figures  4-22 
(a)  and  (b),  r  =  0;  thus,  only  the  present  and  past  observations  are  used  in  the  likelihood 
functions  resulting  in  filtered  state  estimates.  In  contrast,  in  Figures  4-23  (a)  and  (b),  r  ^  0; 
thus,  future  observations  are  used  in  the  likelihood  functions  resulting  in  smoothed  state 
estimates.  A  comparison  of  the  figures  reveals  superior  performance  with  smoothing  over 
filtering.  A  striking  feature  of  the  smoothed  results  is  the  considerable  performance  gain 
even  with  small  input  SNRs. 
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Figure  4-22:  Performance  results  for  global  estimator  with  N  =  4000  and  different  param¬ 
eter  pairs  (m,  r).  (a)  Henon  map;  (b)  Ikeda  map. 


Figure  4-23:  Performance  results  for  global  estimator  with  N  =  4000  and  different  param¬ 
eter  pairs  (m,  r).  (a)  Henon  map;  (b)  Ikeda  map. 
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Figures  4-24  (a)  and  (b)  show  the  performance  results  obtained  with  fixed  m  and  r 
and  different  values  of  N.  As  might  be  expected,  performance  improves  as  N  increases, 
especially  at  larger  input  SNRs. 


Figure  4-24:  Performance  results  for  global  estimator  with  (m,  r)  =  (3,3)  and  different 
values  for  N.  (a)  Henon  map;  (b)  Ikeda  map. 


4.5  Comparison  of  Estimators 

The  question  arises  as  to  which  of  the  three  state  estimators  introduced  in  this  chapter — the 
approximate  ML,  the  EKS,  or  the  approximate  global  MMSE — is  the  best  estimator.  In 
general,  it  is  inappropriate  to  compare  estimators  for  nonrandom  parameters  with  estima¬ 
tors  for  random  parameters,  as  the  underlying  problem  scenarios  are  fundamentally  different 
for  the  two  types  of  scenarios.  However,  in  light  of  the  underlying  assumptions  on  the  a 
priori  state  distribution  used  in  this  chapter,  namely  that  is  given  by  the  physical  measure 
on  the  attractor,  and  in  light  of  the  heuristic  nature  of  the  estimators,  such  a  comparison 
is  at  least  partially  justified  here. 

A  comparison  of  the  performance  results  for  the  three  state  estimators  suggests  that  the 
ML  and  global  MMSE  estimators  perform  comparably,  although  the  experimental  results 
with  the  MMSE  estimator  are  generally  more  consistent.  In  addition,  both  estimators 
considerably  outperform  the  EKS  with  smaller  input  SNRs;  but,  the  EKS  is  the  superior 
estimator  with  larger  input  SNRs.  The  poor  performance  of  the  EKS  with  smaller  input 
SNRs  and  markedly  better  performance  with  larger  input  SNRs  is  typical  performance  for 
a  local,  MMSE  state  estimator. 

As  to  the  question  which  estimator  is  the  best,  there  is  no  simple  answer.  Two  appealing 
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aspects  of  the  global  MMSE  estimator  are  its  ease  of  implementation  as  well  as  its  consistent 
performance  when  applied  to  both  the  Henon  and  Ikeda  maps.  Furthermore,  performance 
with  larger  input  SNRs  is  limited  only  by  the  number  of  terms  N  used  in  (4.42).  Additional 
experiments  with  larger  input  SNRs  have  suggested  that  performance  continues  to  improve 
as  N  increases  beyond  4000. 

One  appealing  aspect  of  the  EKS  is  its  potential  value  for  self-cleaning,  as  suggested 
by  the  performance  results  depicted  in  Figure  4-21.  Another  appealing  aspect  is  that  its 
performance  continues  to  improve  as  the  input  SNR  increases  beyond  20  dB.  However,  two 
unappealing  aspects  of  the  EKS  are  its  mediocre  performance  with  extremely  small  input 
SNRs  and  the  large  number  of  parameters  one  must  specify  when  using  it.  Nonetheless, 
the  performance  results  presented  in  this  chapter  suggest  that  both  the  EKS  and  the  global 
MMSE  estimator  are  potentially  effective  state  estimators  with  chaotic  systems. 
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Chapter  5 


Bounds  on  State  Estimator 
Performance 


5.1  Introduction 


As  noted  in  Chapters  3  and  4,  practical  state  estimators  for  nonlinear  systems  are  often 
heuristic,  and  Monte  Carlo  simulation  is  needed  to  assess  their  performance.  When  at¬ 
tempting  to  design  and  refine  state  estimators  for  a  given  estimation  problem,  one  often 
has  no  way  of  knowing  if  poor  performance  of  a  state  estimator  is  due  to  the  estimator  or  to 
a  fundamental  aspect  of  the  problem  itself.  As  such,  it  is  often  useful  and  desirable  to  know 
the  best  performance  achievable  by  any  state  estimator  for  a  given  estimation  problem,  or 
equivalently  to  have  upper  bounds  on  achievable  state  estimator  performance.  Ideally,  these 
bounds  should  be  “tight”  in  the  sense  that  one  could  derive,  at  least  in  theory,  an  estimator 
with  performance  achieving  the  bounds.  However,  just  as  deriving  practical,  optimal  state 
estimators  is  often  an  elusive  goal,  assessing  the  tightness  of  a  given  performance  bound  is 
often  an  elusive  goal  as  well. 

In  this  chapter,  we  present  and  analyze  computer  simulations  of  several  bounds  on  the 
performance  of  state  estimators  for  chaotic  systems.  The  simulations  and  analysis  indicate 
that  the  Lyapunov  exponents  of  chaotic  systems  strongly  affect  the  achievable  performance 
of  state  estimators  for  these  systems,  and  that  for  a  dissipative,  chaotic  diffeomorphism, 
there  is  a  positive  lower  bound  on  the  achievable  total  error  variance  when  estimating  the 
state  at  a  given  time  no  using  observations  only  for  times  before  no  or  only  for  times  after 
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no-  The  simulations  suggest  that  the  behavior  of  the  Cramer-Rao  bound  on  state  estimator 
performance  for  a  dissipative,  chaotic  diffeomorphism  is  similar  to  that  for  an  unstable,  lin¬ 
ear,  time-invariant  (LTI)  system  for  which  the  eigenvalues  of  the  state  transition  matrix  are 
given  by  the  the  exponentials  of  the  Lyapunov  exponents  of  the  chaotic  system.  This  result 
may  not  be  surprising,  since  the  definitions  of  both  the  Cramer-Rao  bound  and  the  Lya¬ 
punov  exponents  involve  linearizations  of  the  system  dynamics.  However,  the  simulations 
also  reveal  that  this  result  holds  only  when  the  unknown  state  being  estimated  is  treated 
as  a  nonrandom  parameter  vector,  and  that  the  Cramer-Rao  bound  and  generalizations  of 
this  bound  on  state  estimator  performance  when  the  unknown  state  is  treated  as  a  random 
vector  with  known  a  priori  PDF  may  provide  little  if  any  useful  information  for  dissipative, 
chaotic  systems.  The  simulations  also  reveal  the  weakness  of  the  Cramer-Rao  bound  for 
noninvertible,  chaotic  systems,  such  as  the  ones  considered  in  Chapter  6,  even  at  moderate 
input  SNRs,  and  they  suggest  the  value  of  a  generalization  of  the  Cramer-Rao  bound  known 
as  the  Barankin  bound  for  these  systems. 

The  bounds  we  use  in  this  chapter  are  not  new,  and  they  have  been  used  in  the  past  by 
others  for  parameter  estimation  problems.  However,  to  the  best  of  our  knowledge,  they  have 
never  been  used  for  the  types  of  deterministic  systems  of  interest  in  this  thesis.  In  addition, 
the  close  relation  between  the  behavior  of  these  bounds  for  multidimensional,  nonlinear 
systems  and  the  Lyapunov  exponents  of  these  systems  has  apparently  not  been  explored  in 
the  past,  at  least  not  for  the  problem  scenario  focused  on  in  this  chapter  involving  small  to 
moderate  input  SNRs,  nonlinear,  deterministic  system  dynamics,  and  unknown,  nonrandom 
state  vectors. 

We  focus  on  the  state  estimation  scenario  involving  unknown,  nonrandom  state  vectors 
in  part  because  of  the  undesirable  behavior  of  the  performance  bounds  for  random  state 
vectors,  as  illustrated  in  the  final  section  of  the  chapter.  As  a  consequence,  many  of  the 
results  presented  in  this  chapter  strictly  apply  only  to  unbiased  state  estimators.  The 
challenging  task  of  possibly  extending  the  results  to  biased  state  estimators  remains  a  topic 
for  future  research,  since  it  entails  the  discovery  of  new  error  bounds  applicable  to  random 
parameter  vectors. 

The  next  section  briefly  discusses  the  estimation  problem  of  interest  in  this  chapter  and 
the  performance  measures  we  week  to  bound.  The  section  also  briefly  reviews  the  two  gen¬ 
eral  performance  bounds — the  Cramer-Rao  bound  and  the  Barankin  bound — emphasized 
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in  the  chapter.  The  derivation  of  the  specific  form  of  these  bounds  for  the  estimation  prob¬ 
lem  of  interest  in  this  chapter  is  provided  in  Appendix  A.  Section  5.3  provides  computer 
simulations  of  these  bounds  for  various  chaotic  systems.  In  particular,  Subsection  5.3.1 
presents  and  qualitatively  analyzes  simulations  of  the  bounds  for  two,  dissipative,  chaotic 
diffeomorphisms:  the  Henon  map  and  time-sampled  Lorenz  flow.  The  simulations  reveal  the 
fundamental  influence  of  system  Lyapunov  exponents  and  attractor  boundedness  on  achiev¬ 
able  state  estimator  performance  with  these  systems.  Subsection  5.3.2  reveals  the  weakness 
of  the  Cramer- Rao  bound  when  applied  to  noninvertible,  chaotic  systems  even  with  mod¬ 
erate  input  SNRs,  and  the  value  of  the  Barankin  bound  for  use  with  these  systems.  In 
contrast  to  Sections  5.2  and  5.3  which  deal  with  the  problem  of  estimating  nonrandom 
state  vectors,  Section  5.4  deals  with  the  problem  of  estimating  random  state  vectors.  Com¬ 
puter  simulations  of  performance  bounds  applicable  to  this  problem  are  presented,  with 
the  simulations  suggesting  the  limited  value  of  these  bounds  for  state  estimation  involving 
dissipative,  chaotic  systems  and  the  need  for  novel  performance  bounds. 

Finally,  we  emphasize  that  the  purpose  of  this  chapter  is  not  to  evaluate  the  specific 
state  estimation  algorithms  introduced  in  Chapter  4,  but  instead  to  examine  the  limitations 
imposed  by  intrinsic  aspects  of  deterministic,  chaotic  systems  on  theoretically  achievable 
state  estimator  performance  with  these  systems. 

5.2  Bounds  for  Nonrandom  State  Vectors 

As  in  the  previous  chapter,  we  focus  on  the  restricted  DTS/DTO  scenario  given  by 


x(n  +1)  = 

/(*(»)) 

(5.1) 

y{n)  = 

h(x(n))  +  v(n) 

(5.2) 

where  x(n)  is  the  AAdimensional  state  vector,  t?(n)  is  a  "P-dimensional,  zero-mean,  Gaussian 
white-noise  sequence  with  covariance  matrix  R  which  is  independent  of  the  initial  state 
*(0),  and  h  is  a  memoryless  transformation  assumed  to  be  differentiable.  Although  we 
derive  performance  bounds  for  arbitrary,  differentiable  h ,  we  provide  simulations  only  for 
the  special  case  in  which  h  is  the  identity  matrix.  Because  the  CTS/DTO  scenario  can  be 
cast  in  the  form  of  a  DTS/DTO  scenario,  we  use  the  above  state-space  model  to  represent 
both  the  DTS/DTO  and  CTS/DTO  scenarios  in  this  chapter. 
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The  state  estimation  problem  we  focus  on  in  this  section  is  that  of  estimating  cc(rco), 
the  unknown,  but  nonrandom  state  or  equivalently  orbit  point  at  a  fixed  time  no,  given  an 
observation  set  Y{M,N)  =  {j/(0}£Lm-  As  in  previous  chapters,  we  often  omit  the  explicit 
dependence  of  the  observation  set  on  M  and  N  and  use  Y  interchangeably  with  Y(M,N). 
We  also  let  x{no)  denote  an  arbitrary  estimator  for  a; (no)  based  on  Y,  and  we  let  xno  denote 
the  actual,  unknown  value  of  ®(no). 

The  two,  related  performance  measures  we  seek  to  bound  are  the  error  covariance  matrix 
and  the  trace  of  this  matrix,  a  quantity  which  equals  the  sum  of  the  error  variances  of  the 
components  of  x(nQ).  For  a  given  estimator  x(no)  for  the  nonrandom  state  vector  ®(no), 
the  error  covariance  matrix  P(x(no))  is  given  by 

P(®(n0))  =  Ey.Xo  {[i(n0)  -  xno)[x(n0)  -  ®no]r}  -  B(®(n0))  BT{x(n0))  (5.3) 
=  j  _  *rco][®(rao)  —  *"0]  }  P(X‘i  xno  )  dY 

-B{x(no))  BT(x(n0))  (5.4) 


where  Ey-Xo  denotes  expectation  over  the  observation  set  Y  given  that  aj(no)  =  xno, 
p(Y;  xno)  is  the  likelihood  function  or  equivalently  the  PDF  of  Y  given  that  x(n0)  =  xno, 
and  B(x(n0 ))  is  the  bias  of  x(no)  as  given  by 


B(x(n0 )) 


Ey; xo  {*(no)}  *no 
J  x(n0)p(Y-xno)dY  -  xno 


(5.5) 

(5.6) 


In  general,  performance  bounds  for  nonrandom  parameter  vectors,  including  the  two 
bounds  considered  in  this  chapter,  are  applicable  only  to  unbiased  estimators,  which  are 
those  for  which  B(x(no))  =  0.  Although  one  can  adapt  these  bounds  to  handle  biased 
estimators,  the  resulting  bounds  are  (often  undesirably)  estimator  dependent  with  some 
function  of  the  estimator  bias  appearing  in  the  bound.  This  unbiasedness  constraint  on 
the  bounds  limits  their  value,  since  practical  estimators  including  the  ML  estimator  are 
inherently  biased  in  many  estimation  problems.  However,  the  performance  bounds  for 
unbiased  estimators  considered  in  this  chapter  provide  useful  insight  into  the  problem  of 
state  estimation  with  chaotic  systems,  with  this  insight  being  potentially  relevant  to  biased 
estimators  as  well.  In  addition,  one  of  the  bounds  we  consider,  the  Cramer- Rao  bound,  is 
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an  asymptotically  tight  bound  that  is  achievable  asymptotically  with  the  ML  estimator  (an 
asymptotically  unbiased  estimator),  and  a  limiting  case  of  the  other  bound,  the  Barankin 
bound,  is  a  tight  bound  with  an  explicit  form  of  the  unbiased  estimator  achieving  the  bound 
provided  in  [6]. 

5.2.1  Cramer-Rao  Bound 

The  Cramer-Rao  bound  is  perhaps  the  most  widely  used  performance  bound  for  parameter 
estimators.  One  advantage  of  the  Cramer-Rao  bound  over  other  bounds  is  the  relative  ease 
in  explicitly  deriving  it  for  many  estimation  problems,  including  the  problems  considered 
here.  Furthermore,  for  estimation  problems  involving  nonrandom  parameters  with  additive 
noise,  the  bound  is  generally  asymptotically  tight  in  the  sense  that  as  the  input  SNR  goes  to 
infinity,  the  Cramer-Rao  bound  is  achieved  by  an  estimator,  which  in  fact  is  the  ML  estima¬ 
tor.  In  addition,  at  any  input  SNR  if  the  error  covariance  matrix  of  some  unbiased  estimator 
satisfies  this  bound  with  equality,  the  estimator  is  also  the  ML  estimator.  Throughout  the 
remainder  of  the  chapter,  we  distinguish  between  the  Cramer-Rao  bounds  for  estimators  of 
nonrandom  and  random  parameters  by  denoting  the  bound  for  nonrandom  parameter  esti¬ 
mators  as  simply  the  Cramer-Rao  bound  and  the  bound  for  random  parameter  estimators 
as  the  random  Cramer-Rao  bound. 

Use  of  the  Cramer-Rao  bound  requires  that  the  likelihood  function  p(Y;a:(no))  exist 
and  satisfy  certain  regularity  constraints,  specifically  that  it  be  twice  differentiable  with 
respect  to  the  parameter  cc(tio)  at  the  actual  parameter  value  xno  and  that  both  derivatives 
be  integrable  with  respect  to  Y.  For  the  estimation  problem  of  interest  in  this  chapter,  the 
Cramer-Rao  bound  on  P(x(no)),  the  error  covariance  matrix  of  of  x(no),  is  given  by  [86] 

F(*(n0))  >  J-1(*n0)>  (5-7) 

where  J(xno),  the  Fisher  information  matrix,  is  given  by 

*7(*no)  =  EY;x0  {-PJoo)  {^PO^no)}  Ar(no)  {logp(Y;  ®„0  )}}  (5.8) 

and  where  Dx(„0)  {log  p(Y ;  a:no)}  denotes  the  derivative  of  log  p(  Y ;  x(n0))  evaluated  at  xno. 
We  use  this  convention  throughout  the  chapter,  letting  the  subscript  of  the  derivative  oper¬ 
ator  denote  the  variable  of  differentiation  and  indicating  the  value  of  the  variable  at  which 
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the  derivative  is  to  be  evaluated  as  an  argument  of  the  function  being  differentiated. 

As  shown  in  Appendix  A,  for  the  DTS/DTO  scenario  given  by  (5.1)  and  (5.2),  the 
Fisher  information  matrix  reduces  to  the  following: 

•%»,)  =  £  (5.9) 

i=M 

This  expression  for  J(xn 0)  is  closely  related  to  the  expressions  defining  global  and  local 
Lyapunov  exponents  of  /.  In  particular,  as  discussed  in  Chapter  3  the  Lyapunov  exponents 
of  /  are  the  natural  logarithms  of  the  eigenvalues  of  the  following  matrix  [21]: 

Ac  =  lim  {DZ{f(x)}  Dx{f(x)}\h  ,  (5.10) 

where  a;  is  a  point  on  the  attractor  (except  possibly  one  of  those  in  a  set  of  measure  zero), 

whereas  the  local  Lyapunov  exponents  are  the  natural  logarithms  of  the  eigenvalues  of  the 
following  matrix  [2]: 

{££{/'(*)}£>.{/'(*)}}*,  (5.11) 

for  some  fixed  integer  I.  A  comparison  of  (5.9)  and  (5.10)  reveals  that  in  the  special  case 

that  both  h  and  R  are  jVx  JV  identity  matrices,  the  Fisher  information  matrix  J(x(n )) 
consists  of  a  sum  of  partial  products  of  the  infinite  product  of  matrices  which  determines  the 
Lyapunov  exponents  of  f.  Equivalently,  for  this  special  case  the  Fisher  information  matrix 
consists  of  a  sum  of  matrices  which  determine  local  Lyapunov  exponents  for  increasing 
values  of  I  in  (5.11).  As  a  result,  the  local  and  global  Lyapunov  exponents  of  /  strongly 
influence  the  behavior  of  of  J(x(n0))  as  suggested  by  the  simulations  in  Section  5.3. 

In  many  applications,  one  seeks  to  bound  the  trace  of  P(a(no)),  which  is  the  sum  of  the 
error  variances  for  the  components  of  x(uq)  and  which  we  denote  the  total  error  variance. 
That  is, 

Tr{P(x(nQ))}  =  EY;Xo  {(if(no)  -  ^-(n0))2}  (5.12) 

t=i 

where  Tr{ •}  denotes  the  trace  of  the  bracketed  matrix  and  where 

s(n0)  =  [xi(no),x2{n0),---,xjs(no)]T 
x{n0)  =  [5i(rao),^2(«o),---,iAr(no)]T 
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(5.13) 

(5.14) 


From  the  Cramer- Rao  inequality,  it  follows  that 


rr{P(x(n0))}  >  TriJ-^Xn,)}.  (5.15) 

Two  relevant  facts  from  linear  algebra  are  that  the  trace  of  a  matrix  equals  the  sum  of 
its  eigenvalues  and  that  the  eigenvalues  of  an  invertible  matrix  equal  the  reciprocals  of  the 
eigenvalues  of  the  inverse.  Therefore,  if  {A,}  denote  the  set  of  eigenvalues  of  J(xno),  it 
follows  that 

rr{P(i(n„))}>£f.  (5.16) 

i 

Thus,  the  sum  of  the  reciprocals  of  the  eigenvalues  of  the  Fisher  information  matrix  provides 
a  lower  bound  on  the  total  error  variance.  Section  5.3  explores  via  computer  simulation  the 
behavior  of  the  eigenvalues  of  (*„<,)}  as  a  function  of  the  number  of  observations  for 

two,  dissipative,  chaotic  diffeomorphisms,  the  Henon  map  and  time-sampled  Lorenz  flow, 
and  for  a  chaotic  unit-interval  map.  The  simulations  reveal  a  close  relation  between  these 
eigenvalues  and  the  Lyapunov  exponents  of  the  systems,  with  the  qualitative  behavior  of  the 
eigenvalues  for  a  given  chaotic  system  similar  to  that  of  the  eigenvalues  for  a  deterministic, 
LTI  system  which  has  a  diagonal  state  transition  matrix  with  diagonal  elements  given  by 
the  exponentials  of  the  Lyapunov  exponents  of  the  chaotic  system.  This  similarity  in  the 
behavior  of  eigenvalues  of  Fisher  information  matrices  for  estimation  problems  involving 
chaotic  and  linear  systems  may  not  be  surprising  in  light  of  the  fact  that  the  inverse  of  the 
Fisher  information  matrix  for  a  nonlinear  estimation  problem  is  the  actual  error  covariance 
matrix  for  the  linear  estimation  problem  that  arises  by  linearizing  the  nonlinear  problem 
about  the  actual  parameter  value  and  estimating  small  perturbations  about  this  value. 

5.2.2  Barankin  Bound 

As  noted  in  the  previous  subsection,  for  nonlinear  parameter  estimation  problems  the  inverse 
of  the  Fisher  information  matrix  is  the  error  covariance  matrix  for  an  associated  linear 
estimation  problem.  As  a  result,  for  many  nonlinear  estimation  problems,  the  Cramer- Rao 
bound  is  a  fairly  tight  performance  bound  only  with  large  input  SNRs,  with  the  effect  of 
nonlinearities  not  accounted  for  by  the  Cramer- Rao  bound  becoming  important  as  the  SNR 
falls  below  a  certain  threshold.  As  we  show  in  the  next  section,  for  state  estimation  with 
chaotic  systems,  the  Cramer-Rao  bound  can  be  a  weak  bound  even  with  moderately  large 
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input  SNRs.  To  establish  the  threshold  SNR  at  which  the  Cramer-Rao  bound  begins  to 
lose  its  effectiveness,  one  must  generally  consider  other  performance  bounds. 

The  Barankin  bound  is  not  a  single  bound  but  a  general  class,  perhaps  the  most  general, 
of  lower  bounds  on  the  error  moments  of  unbiased  estimators  for  unknown,  nonrandom 
parameters.  Included  in  this  set  are  the  Cramer-Rao  and  Bhattacharyya  bounds.  In  the 
most  general,  probabilistic  setting,  the  Barankin  bound  is  defined  on  a  probability  space 
(ft,/?,//),  where  ft  =  {w}  is  a  set,  0  is  a  <r-algebra  of  subsets  of  ft,  and  //  is  a  probability 
measure  defined  on  0,  with  a  family  of  density  functions  {p(oj;  0)}  (with  respect  to  the 
measure  //)  assumed  to  exist  on  this  space.  The  density  functions  are  indexed  by  the 
parameter  0  €  0,  where  0  is  a  parameter  set.  As  shown  in  [6],  for  any  real- valued  function 
g{u)  which  is  measurable  on  0  and  unbiased  in  the  following  sense 

J  e)  <^(w)  =  9(6),  (5-17) 

the  following  inequality  holds  for  all  finite  m,  real  constants  a,-  and  parameters  0,-  €  0, 
i  =  1,  •  •  • ,  m  such  that  the  region  of  support  of  p(u;  0q)  contains  that  of  each  p(ur,  0t): 


Eu;e0  {($(w)  “  0(*o))2}  =  J (g(u)  -  g(0o)Yp(u-,  90)  dp(u)  (5.18) 


/[XXi  OiL(u;  6i,  do )]2p(a>;  90)dfi(u>) 


(5.19) 


where 


L(v;9i,6 0) 


p{u\  6j) 

p(u;0o)' 


(5.20) 


The  right-hand-side  of  the  above  inequality  is  the  Barankin  bound,  or  more  precisely  an 
element  of  the  class  of  Barankin  bounds. 


With  respect  to  parameter  estimation,  the  probability  space  (ft,/?,/x)  is  the  observation 
space,  the  density  functions  p(oj;0i)  are  likelihood  functions  with  9q  denoting  the  actual 
value  of  the  parameter  9  one  seeks  to  estimate,  and  the  other  0{,  i  =  1,  •  -  * ,  to,  denoting  other 
parameter  values  which  are  typically  referred  to  as  test  points.  A  fundamental,  theoretical 
property  of  the  Barankin  bound  is  that  it  is  a  tight  bound  in  the  sense  that  for  a  fixed 
parameter  value  0q,  the  least  upper  bound  of  the  right-hand-side  of  (5.19)  with  respect 
to  all  finite  m  (the  number  of  test  points),  the  test  points  themselves,  and  the  constants 
a;,  is  achievable  by  an  unbiased  estimator,  an  explicit  expression  for  which  is  given  in  [6]. 
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However,  the  estimator  with  performance  achieving  the  bound  often  has  little  practical 
value,  in  part  because  the  form  of  the  estimator  is  a  function  of  the  unknown  parameter 
value. 

Appendix  A  provides  the  equations  for  a  restricted  form  of  the  Barankin  bound  for 
vector- valued  parameters,  originally  derived  in  [58,  59],  as  specialized  to  the  problem  sce¬ 
nario  of  interest  in  this  chapter  with  the  transformation  h  in  (5.2)  equal  to  the  identity 
operator.  As  shown  in  the  appendix,  this  restricted  form  of  the  Barankin  bound  is  express¬ 
ible  as  a  sum  of  two  components — one  the  inverse  of  the  Fisher  information  matrix,  and  the 
other  a  positive  semidefinite  matrix  which  depends  upon  the  test  points,  the  observation 
noise  covariance  matrix,  and  the  number  of  observations.  For  the  problem  scenario  of  inter¬ 
est  here,  a  strong  sensitivity  of  this  second  component  to  the  test  points,  observation  noise 
covariance  matrix,  and  observations  arises  through  the  matrix  B  each  element  of  which  is 
given  by 

Ba  =  exp  {  f;  [/fc-n0(»;(O)  -  /*-n°(*n0)]  R-1 
l  k=M 

x[/fc“"°(*j(no))  -  /*-no(*no)]}>  (5.21) 

where  m  is  the  number  of  test  points,  and  Xj(rio)  and  s,-(no)  denote  test  points,  i.e.,  values 
of  the  state  vector  x(no)  other  than  xno.  Note  that  for  i  =  j,  Bij  is  the  positive  exponen¬ 
tial  of  the  weighted  distance  between  orbit  segments,  one  containing  the  actual  parameter 
value  and  the  other  containing  a  test  point,  with  the  inverse  of  the  noise  covariance  matrix 
providing  the  weights.  Since  the  term  grows  exponentially  with  this  weighted  distance,  it  is 
extremely  sensitive  to  changes  in  the  noise  covariance  matrix,  number  of  points  in  the  seg¬ 
ments,  and  the  choice  of  test  points.  Whereas  the  inverse  of  B  enters  the  second  component 
of  the  bound,  the  influence  of  this  second  component  on  the  overall  bound  is  greatest  when 
the  weighted  distances  between  orbit  segments  are  smallest.  As  a  consequence,  the  influ¬ 
ence  of  the  second  component  of  this  restricted  form  of  the  Barankin  bound  on  the  overall 
bound  becomes  greater  as  the  input  SNR  decreases.  Such  behavior  of  the  second  compo¬ 
nent  is  desirable,  since  the  first  component,  the  inverse  of  the  Fisher  information  matrix  or 
equivalently  the  Cramer  Rao  bound,  becomes  tighter  as  the  input  SNR  increases. 
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5.3  Computer  Simulations 


In  this  section,  we  present  and  qualitatively  analyze  computer  simulations  of  the  Cramer- 
Rao  and  Barankin  bounds  for  three  chaotic  systems — the  Henon  map,  time-sampled  Lorenz 
flow,  and  generalized  shift  map.  The  first  two  systems  are  dissipative  diffeomorphisms 
whereas  the  third  is  neither  invertible  nor  dissipative.  The  simulations  indicate  that  for 
dissipative,  chaotic  diffeomorphisms  there  is  a  nonzero  lower  bound  on  the  total  error  vari¬ 
ance  for  estimators  of  x{uq )  given  observations  for  only  times  less  than  or  equal  to  no  or 
for  only  times  greater  than  or  equal  to  no,  but  this  lower  bound  asymptotically  approaches 
zero  as  the  number  of  observations  for  times  both  greater  than  and  less  than  no  increases. 
The  simulations  also  indicate  that  the  Cramer- Rao  bound  can  be  a  weak  bound  even  with 
moderately  large  input  SNRs,  for  the  problem  of  estimating  the  initial  condition  of  an  orbit 
segment  generated  by  a  noninvertible,  chaotic  map.  This  weakness  suggests  the  need  for 
other  bounds,  such  as  the  Barankin  bound,  to  realistically  assess  achievable  state  estimator 
performance  with  these  systems. 

Various  aspects  of  the  bounds  are  considered,  but  the  emphasis  is  on  the  behavior  of  the 
eigenvalues  of  the  bounding  matrices  (for  the  dissipative  systems)  as  a  function  of  the  input 
SNR  and  the  number  of  observations.  This  emphasis  on  eigenvalues  arises  from  the  close 
relation  between  these  eigenvalues  and  the  system  Lyapunov  exponents.  As  a  consequence 
of  this  relation,  the  behavior  of  the  eigenvalues  reflects  the  influence  of  the  system  Lyapunov 
exponents  on  the  performance  bound. 

The  question  arises  as  to  how  one  chooses  test  points  used  in  the  Barankin  bound.  As 
noted  in  Section  5.2,  the  restricted  Barankin  bound  used  in  this  chapter  is  expressible  as 
the  sum  of  two  components,  one  the  inverse  of  the  Fisher  information  matrix,  and  the 
other  a  function  of  the  orbit  segments  corresponding  to  these  test  points,  the  number  of 
observations,  and  the  observation  noise  covariance  matrix.  Also  noted  in  that  section  is  the 
fact  that  the  influence  of  this  second  component  on  the  overall  bound  is  greatest  when  the 
distance  between  these  orbit  segments  and  that  corresponding  to  the  actual  parameter  value 
is  smallest.  With  this  in  mind,  we  chose  the  test  points  for  the  dissipative  systems  used  for 
the  simulations  using  the  orbit  matching  approach  introduced  in  Chapter  4.  Specifically,  a 
reference  orbit  was  generated  and  the  m  reference  orbit  points  chosen  as  test  points,  where 
m  is  the  number  of  test  points,  for  which  the  distances  between  the  corresponding  orbit 
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segments  and  that  corresponding  to  the  actual  state  vector  were  smallest.  For  the  unit- 
interval  map,  the  m  points  with  orbits  differing  from  that  of  the  actual  initial  condition  by 
the  least  number  of  points  were  used  as  test  points. 


5.3.1  Simulations  with  Dissipative,  Chaotic  Diffeomorphisms 

In  this  section,  we  provide  computer  simulations  of  the  Cramer-Rao  and  restricted  Barankin 
bounds  for  the  Henon  map  and  time-sampled  Lorenz  flow.  The  sampling  interval  for  the 
Lorenz  flow  used  in  all  simulations  was  .005  seconds.  In  addition,  with  the  exception  of  the 
results  depicted  in  Figures  5-1 — 5-6,  all  results  were  obtained  with  normalized  systems  for 
which  the  experimentally  obtained  signal  variance  of  each  component  of  the  state  vector 
was  identical.  In  other  words,  the  components  of  the  state  vector  were  individually  scaled 
so  that  they  all  shared  the  same  variance.  Among  the  reasons  for  the  scaling  was  that  it 
permitted  the  use  of  a  single  observation  noise  intensity  to  achieve  the  same  input  SNR 
for  each  component  of  the  state  vector.  Second,  it  prevented  the  contribution  from  one 
component  of  the  state  vector  to  necessarily  dominate  the  total  error  variance.  The  original 
systems,  without  scaled  state  vectors,  were  used  for  the  results  shown  in  Figures  5-1 — 5-6 
to  facilitate  the  comparison  of  these  systems  with  diagonal,  linear  systems  having  the  same 
Lyapunov  exponents.  Except  where  stated  otherwise,  all  simulation  results  were  obtained 
with  ft,  the  observation  noise  covariance  matrix,  given  by  cr2  Itf  where  I\r  is  the  (A f  x  N)- 
identity  matrix,  and  with  fe,  the  output  transformation,  equal  to  the  identity  operator. 

Figures  5-1 — 5-3  (a)  depict  the  eigenvalues  of  the  inverse  of  the  Fisher  information  matrix 
for  the  Henon  map  as  a  function  of  the  number  of  past,  future,  and  both  past  and  future 
observations,  respectively,  with  an  arbitrary  point  on  the  attractor  used  as  xno .  Numerical 
problems  arise  when  calculating  the  eigenvalues  of  the  Fisher  information  matrix  for  the 
Henon  map  as  the  number  of  observations  increases.  As  a  result,  relatively  few  numbers  of 
observations  could  be  used  for  the  figures.  Figures  5-1 — 5-3  (b)  depict  analogous  information 
for  the  unstable,  diagonal,  linear  system  Fh  which  has  the  same  set  of  Lyapunov  exponents 
{.42,  —1.62}  as  the  Henon  map  and  is  given  by 


Fh  = 


e-42  0 

0  e-1-62 


(5.22) 
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Figure  5-1:  Normalized  eigenvalues  of  J~1(xno )  with  observation  set  y(n0,no  +  AT).  (a) 
Henon  map;  (b)  F h- 


(a)  NUMBER  OF  OBSERVATIONS  (N) 


Figure  5-2:  Normalized  eigenvalues  of  J  1(xno)  with  observation  set  Y(no  —  N,  no),  (a) 
Henon  map;  (b)  Fh- 


Figure  5-3:  Normalized  eigenvalues  of  J  x(a:no)  with  observation  set  Y(no  —  N,n0  +  N). 
(a)  Henon  map;  (b)  FV 


Figures  5-4 — 5-6  depict  analogous  information  for  the  time-sampled  Lorenz  flow  and 
the  unstable,  diagonal,  linear  system  Fj  which  has  the  same  set  of  Lyapunov  exponents 
{-.1125,  .0075,0}  as  the  Lorenz  flow  with  sampling  interval  of  .005  seconds  and  is  given  by 


F;  = 


e-0075  0  0 

0  e°  0 
0  0  e“1125 


(5.23) 


Figure  5-4:  Normalized  eigenvalues  of  J  x(a:n0)  with  observation  set  Y (n0,n0  +  N).  (a) 
Sampled  Lorenz  flow;  (b)  F;. 


Figure  5-5:  Normalized  eigenvalues  of  J  1  (xno )  with  observation  set  Y(uq  —  JV,  no),  (a) 
Sampled  Lorenz  flow;  (b)  F/. 

As  indicated  by  Figures  5-1 — 5-6,  for  both  the  Henon  and  Lorenz  systems  the  eigenvalues 
of  J-1(xno)  have  the  same  qualitative  behavior  as  those  of  the  corresponding  unstable, 
linear  systems.  In  light  of  this,  it  is  useful  to  analyze  the  inverse  of  the  Fisher  information 
matrix  for  diagonal,  LTI  systems  to  better  understand  the  behavior  of  the  inverse  of  the 
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Figure  5-6:  Normalized  eigenvalues  of  J  2(a:no)  with  observation  set  Y(uq  —  IV,  no  +  N). 
(a)  Sampled  Lorenz  flow;  (b)  Fi. 


Fisher  information  matrix  and  consequently  the  Cramer-Rao  bound  for  chaotic  systems. 
Properties  of  the  error  covariance  matrix  of  optimal,  Bayesian  state  estimators  for  arbitrary, 
noise-driven  linear  systems  have  been  studied  extensively  in  the  past.  We  only  sketch  the 
highlights  of  these  properties  relevant  to  the  specific  problem  of  interest  here.  For  an  LTI 
system  with  state  and  observation  equations  given  by 


x(n  +  1)  =  Fx(n ) 


(5.24) 


y{n)  —  x(n)  +  v(n ) 


(5.25) 


were  {u(n)}  is  a  Gaussian,  white-noise  sequence  with  covariance  matrix  given  by  a 2  Ijj  and 
where  F  is  an  M  x  Af  diagonal  matrix  given  by 


eXl  0  •  •  •  0 

0  eAz  0 

0  0  •  •  •  ex" 


(5.26) 


with  the  Xi  assumed  to  be  real  and  distinct,  the  inverse  of  the  Fisher  information  matrix 
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J  1(®n0)  f°r  arbitrary  xno  and  observation  set  Y(M,  N)  is  given  by 


J  *  (*n0 )  =  o 


Sf1  0 

2  0  S2~: 


(5.27) 


0  0 


•  S7/ 


where 


1  — 

0-1  _  -2n0Aj _ £___ _ 

_  C  e2MAj  _  e(2N+2)Aj 


(5.28) 


Because  J~1(xno)  is  a  diagonal  matrix,  its  eigenvalues  are  given  by  its  diagonal  elements, 
and  consequently  analyzing  the  eigenvalues  of  J~l(xno)  entails  analyzing  the  diagonal  el¬ 
ements.  Our  interest  is  in  the  scaling  behavior  of  each  diagonal  term  Sj1  as  N  becomes 
increasingly  positive  and  M  becomes  increasingly  negative.  The  results  are  the  following: 

•  If  Xj  >  0,  then  the  term  e2MA->  grows  smaller  as  M  becomes  increasingly  negative,  so 
that  for  large  negative  values  of  M 


S*1  ~  e(2«o  — 2) Aj (e2Aj  _  l)e~2iVAj 


(5.29) 


which  goes  to  zero  only  if  N  — >  oo  and  does  so  exponentially  at  a  rate  Xj.  Thus,  for 
Xj  >  0  and  a  fixed,  finite  value  of  N,  S~ 1  >  C(N)  >  0  for  some  constant  C(N)  which 
depends  on  N. 

•  If  Xj  <  0,  then  the  situation  is  reversed,  and  e2NX]  grows  smaller  as  N  becomes 
increasingly  positive,  so  that  for  large  positive  values  of  N 


S~1  ~  g2n°Aj(i  _  e2A^e— 2AfAi} 


(5.30) 


which  goes  to  zero  only  if  M  — >  -oo  and  does  so  exponentially  at  a  rate  Xj.  Thus, 
for  Xj  <  0  and  a  fixed,  finite  value  of  M,  S~l  >  C\M)  >  0  for  some  constant  C'(M ) 
which  depends  on  M. 

•  If  Xj  —  0,  then 


c-i  _  1 

^  N-M+V 


(5.31) 


which  is  the  reciprocal  of  the  total  number  of  observations.  Thus,  for  A7  =  0,  Sj 


scales  as  the  reciprocal  of  the  number  of  observations. 


As  noted  earlier,  for  deterministic,  LTI  systems  with  real-valued  eigenvalues  none  of 
which  are  zero-valued,  the  Lyapunov  exponents  are  the  logarithms  of  the  absolute  value  of 
the  eigenvalues.  Thus,  the  Lyapunov  exponents  of  F  consist  of  the  set  As  such, 

the  above  results  indicate  that  for  diagonal,  LTI  systems  with  both  positive  and  negative 
Lyapunov  exponents,  the  bound  on  the  total  error  variance  of  estimators  for  xno  as  given 
by  the  sum  of  the  eigenvalues  of  the  Fisher  information  matrix  is  nonzero  when  there  are 
only  a  finite  number  of  observations  for  times  before  no,  even  if  the  number  of  observations 
for  times  after  no  goes  to  infinity.  Similarly,  the  bound  is  nonzero  when  there  are  only  a 
finite  number  of  observations  for  times  after  no  even  if  the  number  of  observations  for  times 
before  no  goes  to  infinity.  Only  as  the  number  of  observations  for  times  both  before  and 
after  no  goes  to  infinity  does  the  sum  of  the  eigenvalues  and  thus  a  lower  bound  on  the  total 
error  variance  decay  asymptotically  to  zero. 

Since  dissipative,  chaotic  systems  have  both  positive  and  negative  Lyapunov  exponents 
and  in  light  of  the  similar,  experimentally  observed,  qualitative  behavior  of  the  eigenvalues 
of  J~1(xno)  for  a  chaotic  system  and  the  diagonal  LTI  system  with  the  same  Lyapunov 
exponents,  it  appears  that  a  similar  result  holds  for  chaotic  systems,  in  the  sense  that  there 
is  a  nonzero  bound  on  the  total  error  variance  of  unbiased  state  estimators  for  xno  given 
only  observations  for  times  before  or  after  no-  The  practical  implication  is  that  achieving 
large  SNR  gains  when  performing  state  estimation  with  dissipative,  chaotic  systems  requires 
the  use  of  observations  both  before  and  after  the  time  of  interest. 

It  is  tempting  to  take  this  analogy  between  chaotic  systems  and  unstable  LTI  systems 
one  step  further  and  conclude  that  as  with  deterministic,  LTI  systems,  independent  of 
the  input  SNR  there  is  an  exponential  decay  in  uncertainty,  or  at  least  a  bound  on  this 
uncertainty,  in  the  value  of  the  state  at  time  n o  as  the  number  of  observations  for  times 
both  greater  than  and  less  than  no  increases,  or  at  worst  a  decay  with  rate  dominated  by  the 
reciprocal  of  the  number  of  observations.  However,  as  noted  earlier  the  Cramer- Rao  bound 
for  nonlinear  systems  only  reflects  properties  of  the  locally  linearized  system.  An  intrinsic 
nonlinearity  of  dissipative,  chaotic  systems  is  the  boundedness  of  orbits  on  the  attractor. 
As  one  might  expect,  the  size  of  a  chaotic  attractor,  as  given  by  the  variance  of  orbits  on  the 
attractor,  influences  achievable  state  estimator  performance  especially  as  the  attractor  size 
and  observation  noise  intensity  become  comparable.  Whereas  the  Cramer-Rao  bound  does 
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not  account  for  this  nonlinearity,  the  bound  becomes  weaker  as  the  input  SNR  decreases, 
as  is  the  case  with  most  nonlinear  systems.  The  question  arises  as  to  the  threshold,  or 
equivalently  the  input  SNR,  at  which  the  Cramer- Rao  bound  begins  to  lose  its  effectiveness 
and  the  rate  at  which  it  does  so  for  chaotic  systems.  One  particularly  illustrative  way  to  at 
least  qualitatively  answer  this  question  is  with  the  use  of  the  Barankin  bound. 

Figures  5-7 — 5-12  (a)  depict  the  eigenvalues  of  the  Cramer-Rao  bound  as  functions  of 
the  number  of  past,  future,  and  both  past  and  future  observations,  respectively  with  an 
input  SNR  of  10  dB  and  with  an  arbitrarily  selected  point  on  the  attractor  used  as  xnQ .  In 
contrast  to  the  earlier  figures,  the  plotted  results  are  not  normalized  by  the  noise  variance 
a2.  Figures  5-7 — 5-12  (b)  depict  analogous  information  for  the  restricted  Barankin  bound 
with  5  test  points. 


Figure  5-7:  Eigenvalues  of  Cramer-Rao  and  Barankin  bounds  with  observation  set 
Y(n0,  no  +  N )  for  Henon  map.  (a)  Cramer-Rao;  (b)  Barankin. 


Figure  5-8:  Eigenvalues  of  Cramer-Rao  and  Barankin  bounds  with  observation  set  Y (no  — 
N,  no)  for  Henon  map.  (a)  Cramer-Rao;  (b)  Barankin. 

A  comparison  of  the  results  in  part  (a)  with  those  in  part  (b)  for  each  of  the  figures 
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Figure  5-9:  Eigenvalues  of  Cramer- Rao  and  Barankin  bounds  with  observation  set  Y  (no  — 
N,  no  +  N)  for  Henon  map.  (a)  Cramer- Rao;  (b)  Barankin. 


Figure  5-10:  Eigenvalues  of  Cramer-Rao  and  Barankin  bounds  with  observation  set 
Y(no,no  +  N)  for  time-sampled  Lorenz  flow,  (a)  Cramer-Rao;  (b)  Barankin. 


Figure  5-11:  Eigenvalues  of  Cramer-Rao  and  Barankin  bounds  with  observation  set  Y(no  — 
N,  no)  for  time-sampled  Lorenz  flow,  (a)  Cramer-Rao;  (b)  Barankin. 
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Figure  5-12:  Eigenvalues  of  Cramer- Eao  and  Barankin  bounds  with  observation  set  Y(no  — 
N,tiq  +  N)  for  time-sampled  Lorenz  flow,  (a)  Cramer-Rao;  (b)  Barankin. 


indicates  that  the  decay  rate  for  the  most  rapidly  decaying  eigenvalue  with  observations  at 
times  before  no  is  much  smaller  for  the  Barankin  bound  than  for  the  Cramer-Rao  bound, 
thereby  suggesting  that  the  exponential  decay  rates  associated  with  the  Cramer-Rao  bound 
are  overly  optimistic.  Figures  5-13  and  5-14  depict  the  eigenvalue  sums  for  the  Cramer- 
Rao  and  Barankin  bounds  as  a  function  of  the  input  SNR  for  two  different  values  of  N  in 
the  observation  sets  Y(no  —  N,  no  +  N).  The  figures  indicate  that  for  both  systems,  the 
the  Cramer-Rao  and  Barankin  bounds  on  the  total  error  variance  deviate  at  input  SNRs 
below  a  threshold  that  depends  upon  both  the  system  and  the  number  of  observations.  The 
results  suggest  the  value  of  the  Barankin  bound  on  assessing  achievable  state  estimation 
performance  with  dissipative,  chaotic  systems  and  input  SNRs  smaller  than  20  dB. 


Figure  5-13:  Sum  of  eigenvalues  of  Cramer-Rao  and  Barankin  bounds  with  observation  set 
Y(no  —  IV,  no  +  N )  for  Henon  map.  (a)  N  =  6;  (b)  N  =  12. 
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Figure  5-14:  Sum  of  eigenvalues  of  Cramer- Rao  and  Barankin  bounds  with  observation  set 
Y(no  —  N,  no  +  N)  for  time-sampled  Lorenz  flow,  (a)  N  =  20;  (b)  N  =  40. 

5.3.2  Simulations  with  Unit-Interval  Maps 

In  this  section,  we  consider  performance  bounds  on  unbiased  estimators  of  the  initial  condi¬ 
tion  for  a  noninvertible,  chaotic  map.  Although  we  focus  on  unit-interval  maps,  the  results 
are  relevant  to  multidimensional  chaotic  maps  as  well.  In  particular,  we  consider  bounds 
on  estimators  for  the  initial  condition  z(0)  of  the  shift  map  with  observation  set  Y (0,  N), 
where  the  general  form  of  the  shift  map  is  given  by 

x(n  +  1)  =  =  ax(n )  (mod  0 )  (5.32) 

where  (mod  0)  denotes  modulo  0  and  where  a  is  an  integer  with  absolute  value  greater 
than  one.  Figure  5-15  depicts  the  function  /  for  the  parameter  pair  (a  =  4,  0  —  1),  the 
pair  used  for  all  examples  in  this  section. 


x(n) 

Figure  5-15:  Shift  map  with  parameter  pair  (a  =  4,  0  =  1). 

As  indicated  in  the  figure,  the  shift  map  is  discontinuous,  with  a  —  1  points  of  disconti- 
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nuity.  However,  the  map  is  differentiable  except  at  these  a  —  1  points  and  has  the  constant 
derivative  of  a.  The  Cramer-Rao  bound  is  defined  for  each  initial  condition  Xo  which  is 
not  a  discontinuity  point  of  any  of  the  composed  functions  {fl}fL0.  For  each  such  initial 
condition  and  observation  set  F(0,iV),  a  straightforward  derivation  reveals  that  the  inverse 
of  the  Fisher  information  J~1  (x0)  is  given  by 

J"1(x(0))  =  a2aMN+x)1_l  (5-33) 

where  a2  is  the  variance  of  the  observation  noise.  This  expression  decays  exponentially 
with  the  number  of  observations  at  a  rate  a-2.  A  similar  result,  exponential  decay  of  the 
Cramer-Rao  bound  with  the  number  of  future  observations,  holds  for  all  other  unit  interval 
maps  having  positive  Lyapunov  exponents.  However,  for  the  shift  map,  a  points  are  mapped 
by  /  to  each  point;  as  a  consequence  and  independent  of  the  number  of  observations,  a  —  1 
orbit  segments  differ  from  that  generated  by  xo  by  a  single  point.  Similarly,  a*  —  1  orbit 
segments  differ  from  that  generated  by  xo  by  at  most  i  points  for  each  integer  i.  In  light  of 
this,  one  might  expect  the  Cramer-Rao  bound  to  be  a  weak  bound  for  the  shift  map  even 
with  moderately  large  input  SNRs. 

Figures  5-16  and  5-17  confirm  this  expectation.  The  figures  depict  the  Cramer-Rao  and 


Figure  5-16:  Cramer-Rao  and  Barankin  bounds  for  initial  condition  estimators  of  the  shift 
map  with  parameter  pair  (a  =  4,  ft  =  1)  and  with  observation  set  Y (0,  N).  (a)  Input  SNR 
=  15  dB;  (b)  Input  SNR  =  5  dB. 

Barankin  bounds  on  the  performance  of  unbiased  estimators  for  x(0)  for  the  shift  map  as  a 
function  of  the  number  of  observations  and  input  SNR,  respectively.  Eight  test  points  were 
used  in  the  Barankin  bound  for  the  results  shown  in  the  figures,  with  the  orbit  segments  of 
the  test  points  differing  from  that  of  the  actual  initial  condition  by  at  most  2  orbit  points. 
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Figure  5-17:  Cramer-Rao  and  Barankin  bounds  for  initial  condition  estimators  of  the  shift 
map  with  parameter  pair  (a  =  4,  /?  =  1)  and  with  observation  set  Y(0,  IV).  (a)  N  =  4;  (b) 
N  =  8. 

The  figures  indicate  that  the  Cramer-Rao  bound  becomes  a  progressively  weaker  bound  as 
the  number  of  observations  increases.  In  the  figures,  the  nonzero  limiting  behavior  of  the 
Barankin  bound  arises  from  the  noninvertibility  of  the  shift  map.  Divergence  of  the  Cramer- 
Rao  and  Barankin  bounds  is  not  unique  to  the  shift  map,  but  instead  can  be  expected  with 
any  chaotic,  unit-interval  map  and  certain,  noninvertible,  multidimensional  chaotic  maps 
as  well. 

5.4  Bounds  for  Random  State  Vectors 

A  fundamental  limitation  of  performance  bounds  for  estimators  of  nonrandom  parameters  is 
that  they  are  either  applicable  only  to  unbiased  estimators  or  they  are  estimator-dependent 
with  the  dependency  generally  a  function  of  the  estimator  bias.  In  contrast,  many  per¬ 
formance  bounds  for  estimators  of  random  parameters  are  applicable  to  both  biased  and 
unbiased  estimators,  and  the  bounds  are  not  estimator  dependent.  In  this  section,  we  briefly 
consider  performance  bounds  for  state  estimators  of  dissipative,  chaotic  systems,  when  the 
unknown  state  vector  x{uq)  is  a  random  vector  with  known  a  priori  PDF.  Experimental 
results  are  presented  which  suggest  that  widely  used  performance  bounds  for  estimators  of 
random  parameters  have  limited  value  with  dissipative,  chaotic  maps.  The  specific  problem 
we  consider  is  that  of  estimating  the  random  initial  condition  x(0)  of  an  orbit  segment 
generated  by  a  dissipative,  chaotic  diffeomorphism  given  the  (N  +  l)-point  observation  set 
Y(0,  N)  for  the  special  case  in  which  p(x(0)),  the  a  priori  PDF  of  the  initial  condition,  is 
Gaussian  with  mean  vector  xq  and  covariance  matrix  72  Ijsf.  All  experimental  results  in  the 
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section  were  obtained  with  the  time-sampled  Lorenz  flow.  Because  the  Lorenz  attractor  has 
a  bounded  region  of  attraction,  the  mean  vector  and  covariance  matrix  of  p(x(0))  used  in 
obtaining  the  experimental  results  were  chosen  to  ensure  that  the  probability  of  an  initial 
condition  lying  outside  this  region  was  extremely  small. 


5,4.1  Random  Cramer-Rao  Bound 

In  general,  if  the  joint  probability  measure  on  the  initial  condition  x(0)  and  observation  set 
Y  has  a  corresponding  PDF  p(Y,  x(0)),  then  the  error  correlation  matrix  Pr(x(0))  for  any 
estimator  x(0)  of  x(0)  is  given  by 

Pr(x(0))  =  EyM0)  {[*(0)  -  ®(0)p(0)  -  x(0)]T}  (5.34) 

=  J  J  {[*(0)  -  x(0)][x(0)  -  £c(0)]r}  p(Y,x(0))dY  dx(0).  (5.35) 

where  as  indicated  above  EyiX( o)  denotes  expectation  over  the  joint  PDF  p(Y,  x(0)).  The 
trace  of  Pr(x(0))  yields  the  total  mean-squared  (estimation)  error  (MSE)  for  the  com¬ 
ponents  of  x(0).  The  Cramer-Rao  bound  on  Pr(x(0)),  hereafter  denoted  the  random 
Cramer-Rao  bound ,  is  given  by 


Pr(x(0))>Jr-\x(0))  (5.36) 


where  Jr(®(0)),  the  Fisher  information  matrix  for  the  random  parameter  vector  x(0), 
hereafter  denoted  the  random  Fisher  information  matrix ,  is  given  by  [86] 


Jr(®(0)) 

=  Ey,x(o)  {Dm  {logp(Y,  ac(0))}  Dx{0)  {logp(Y,  *(0))}} 

(5.37) 

or  equivalently  by 

Jr(x(0)) 

=  Ey>x{ o)  {Dj(0)  {logp(F|x(0))}  Dx{ o)  {logp(Y|x(0))}} 

+EX( o)  {-Dj(0)  {logK*(0))}  Dx( 0)  {logp(x(0))}} 

(5.38) 

=  Exi0){J(x(  0))} 

+EX( o)  {-Dj(0)  {logp(*(0))>  Dx{ o)  {logp(*(0))}} 

(5.39) 
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where  J(x(0))  has  the  same  form  as  the  Fisher  information  matrix  for  nonrandom  param¬ 
eters  used  in  the  previous  section.  Use  of  the  random  Cramer-Rao  bound  requires  that 
p(Y,x(n0))  exist  and  satisfy  certain  regularity  constraints,  specifically  that  it  be  twice  dif¬ 
ferentiable  with  respect  to  x(no)  and  that  both  derivatives  be  integrable  with  respect  to 
both  x(no )  and  Y . 

Since  p(®(0))  is  a  Gaussian  PDF  with  covariance  matrix  72  Iy,  it  follows  that 

EX( 0)  {£j(0)  {logp(®(0))> £«(0)  {logP(®(0))>}  =  7"2  Itf,  (5.40) 

so  that  for  the  problem  of  interest  here  Jr(x( 0))  reduces  to 

Jr(*( 0))  =  Em  { J(*(0))}  +  7'2 1*-  (5.41) 

As  indicated  by  (5.41),  the  first  component  of  Jr(x(0))  involves  an  averaging  of  matrices, 
where  each  matrix  has  the  same  form  as  the  Fisher  information  matrix  for  a  nonrandom 
parameter.  As  suggested  by  the  simulation  results  that  follow,  because  of  this  averaging 
the  behavior  of  the  random  Fisher  information  matrix  for  initial  condition  estimators  with 
dissipative,  chaotic  maps  differs  considerably  from  that  of  the  nonrandom  Fisher  information 
matrix  (i.e,  the  Fisher  information  matrix  for  nonrandom  parameters).  The  results  indicate 
that  this  averaging  can  cause  the  random  Fisher  information  matrix  to  exhibit  undesirable 
behavior  which  severely  limits  the  value  of  the  random  Cramer-Rao  bound  for  disspative, 
chaotic  systems. 

Figures  5-18  and  5-19  depict  the  eigenvalues  of  J (x(Q))  for  the  time-sampled  Lorenz 
flow  as  a  function  of  N  for  input  SNRs  of  20  dB  and  30  dB,  respectively.  For  these  results 
and  all  the  results  reported  in  this  section,  the  expectation  over  ®(0)  in  (5.41)  was  performed 
using  Monte  Carlo  simulation,  with  J(x(0))  calculated  for  1000  samples  of  ®(0)  randomly 
selected  according  to  p(a;(0))  and  the  resulting  matrices  averaged.  As  indicated  by  both 
pairs  of  figures,  the  behavior  of  the  eigenvalues  differs  considerably  for  the  different  values 
of  7,  with  the  behavior  for  the  smaller  value  similar  to  that  observed  earlier  for  nonrandom 
parameters.  For  the  larger  value  of  7,  all  of  the  eigenvalues  decrease  with  increasing  N, 
with  none  exhibiting  the  limiting  behavior  exhibited  by  those  for  the  smaller  value  of  7. 

A  nonrigorous  explanation  for  this  eigenvalue  behavior  is  the  following.  Although  the 
Lyapunov  exponents  of  a  chaotic  system  are  the  same  for  nearly  all  initial  conditions,  the 
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Figure  5-18:  Eigenvalues  of  random  Cramer-Rao  bound  for  initial  condition  estimators  of 
time-sampled  Lorenz  flow  with  observation  set  Y(0,  N )  and  input  SNR  of  20  dB.  (a)  7  =  .1; 
(b)  7=1. 
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Figure  5-19:  Eigenvalues  of  random  Cramer-Rao  bound  for  initial  condition  estimators  of 
time-sampled  Lorenz  flow  with  observation  set  Y (0,  N)  and  input  SNR  of  30  dB.  (a)  7  =  .1; 
(b)  7=1. 
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local  manifolds  or  directions  associated  with  these  exponents  differ  at  each  point.  With 
observations  of  future  state  values  and  a  fixed  initial  condition  a:(0),  the  Fisher  information 
(as  given  by  J(x(0 )))  along  the  directions  associated  with  negative  Lyapunov  exponents 
remains  small  as  the  number  of  future  observations  increases,  with  a  similar  relation  among 
the  directions  associated  with  positive  Lyapunov  exponents  and  past  observations.  Equiv¬ 
alently,  J(x( 0))  has  bounded  eigenvalues,  one  for  each  negative  Lyapunov  exponent,  which 
thus  have  bounded,  nonzero  inverses,  even  as  the  number  of  future  observations  becomes 
large.  However,  the  eigenvectors  associated  with  these  eigenvalues  differ  for  different  values 
of  ®(0).  All  other  eigenvalues  increase  without  bound  as  the  number  of  future  observations 
increases.  When  the  expectation  is  taken  over  ®(0)  and  there  is  thus  an  averaging  of  the 
matrices  J(x( 0)),  the  resulting  averaged  Fisher  information  is  not  small  in  any  direction  for 
the  larger  value  of  7.  Equivalently,  both  the  large  and  small  eigenvalues  of  the  individual 
matrices  J(aj(0))  influence  each  eigenvalue  of  £w0)  {-/ (a;(0))}  with  computer  experiments 
suggesting  that  each  of  these  eigenvalues  increases  as  the  number  of  future  observations 
increases.  Since  the  inverses  of  these  eigenvalues  (incremented  by  7-2)  are  the  eigenvalues 
of  Jjr~1(*(0)),  the  result  is  that  all  eigenvalues  of  Jr_1(sc(0))  decrease  with  increasing 
N.  In  contrast,  for  the  smaller  value  of  7,  the  local  manifolds  associated  with  Lyapunov 
exponents  for  the  values  of  ®(0)  used  in  the  averaging  differ  very  little,  so  that  the  average 
of  the  matrices  J(®( 0))  has  similar  eigenvalue  properties  as  each  of  the  matrices  J(a;(0)) 
individually. 

This  averaging  of  information  matrices  can  lead  to  undesirable  behavior  by  the  Cramer- 
Eao  bound.  Figures  5-20  (a)  and  (b)  depict  the  superimposed  eigenvalue  sums  of  J«_1(®(0)) 
for  the  two  values  of  7  used  for  the  earlier  figures  and  with  input  SNRs  of  20  dB  and  30 
dB,  respectively.  In  both  figures,  there  is  a  threshold  value  of  N  above  which  the  eigenvalue 
sum  is  smaller  for  the  larger  value  of  7.  In  addition,  this  threshold  is  smaller  with  the 
larger  input  SNR  value.  The  figures  indicate  that  when  N  increases  beyond  a  threshold, 
the  bound  on  the  MSE  is  smaller  when  there  is  greater  a  priori  uncertainty  in  the  initial 
condition  x(0).  If  in  fact  the  bound  were  achievable,  the  nonsensical  implication  would  be 
that  achievable  state  estimator  performance  increases  with  decreasing  a  priori  knowledge 
of  the  actual  value  of  the  initial  condition.  Furthermore,  this  undesirable  behavior  of  the 
bound  becomes  more  pronounced  as  the  input  SNR  increases.  Consequently,  the  random 
Cramer- Rao  bound  on  the  performance  of  initial  condition  estimators  of  dissipative,  chaotic 
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Figure  5-20:  Summed  eigenvalues  of  random  Cramer-Rao  bound  for  initial  condition  esti¬ 
mators  of  time-sampled  Lorenz  flow  with  observation  set  Y(0,  N).  (a)  Input  SNR  =  30  dB; 
(b)  Input  SNR  =  20  dB. 


maps  is  least  effective  with  input  SNRs  for  which  the  Cramer-Rao  bound  is  generally  most 
effective. 


5.4.2  Weiss- Weinstein  Bound 

Just  as  the  nonrandom  Cramer-Rao  bound  has  a  random  counterpart,  the  Barankin  bound 
has  random  counterparts  as  well  [9,  88,  89,  90],  with  the  most  general  of  these  counterparts 
being  the  Weiss- Weinstein  class  of  performance  bounds  for  estimators  of  random  parameters. 
In  contrast  to  the  probabilistic  setting  for  the  Barankin  bound  provided  in  Section  5.2.2, 
the  probabilistic  setting  for  the  Weiss- Weinstein  involves  two  probability  spaces — one  the 
observation  space  (fi,/3,/x)  (as  with  the  Barankin  bound),  and  the  other  the  parameter 
space  (0,  t?,  u),  where  0  =  {9}  is  a  set  of  parameter  values,  77  is  a  cr-algebra  of  subsets  of 
0,  and  v  is  a  probability  measure  defined  on  77.  An  underlying  assumption  is  that  the  joint 
probability  measure  on  the  two  spaces  is  absolutely  continuous  with  respect  to  the  product 
measure,  so  that  the  joint  density  function  p(u> ,  9)  exists.  As  shown  in  [90]  (with  slightly 
different  notation),  for  any  real- valued  function  g(u)  that  is  measurable  on  fl  the  following 
inequality  holds  for  all  finite  n,  real  constants  aj,  offsets  z,-,  and  exponents  s,  satisfying 
0  <  Si  <  1,  for  i  =  1,  •  •  • ,  n: 

E„,e  {($(«)  -  9(9))2} 

>  KU  Eu,e  {[g(g  -  Zi)  -  g(9)\Ll-*<(u,  9  -  z<,  fl)}]2 
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where 


£(w,#i,02) 


P(u,9 1) 

P(“>»0  2)' 


(5.43) 


for  0i, #2  €  0  and  where  Ew>s  denotes  expectation  over  the  joint  density  p(uj,  0).  The  above 
inequality,  the  Weiss- Weinstein  bound,  is  the  counterpart  to  (5.19),  the  inequality  corre¬ 
sponding  to  the  Barankin  bound.  Note  that  whereas  constant  test  points  u>t :  corresponding 
to  other  parameter  values  are  used  in  the  Barankin  bound,  constant  offsets  z,-  are  used  in 


the  Weiss- Weinstein  bound. 


Appendix  A  provides  the  equations  for  a  restricted  form  of  the  Weiss- Weinstein  bound 
for  vector-valued  parameters  for  the  problem  scenario  of  interest  in  this  chapter.  As  shown 
in  the  appendix,  analogous  to  the  restricted  form  of  the  Barankin  bound  considered  earlier 
in  the  chapter,  this  restricted  form  of  the  Weiss- Weinstein  bound  is  expressible  as  a  sum  of 
two  components — one  the  inverse  of  the  random  Fisher  information  matrix,  and  the  other 


a  matrix  which  depends  on  the  test  offsets,  the  observation  noise  covariance  matrix,  and 
the  number  of  observations.  Figure  5-21  depicts  the  random  Cramer-Rao  and  restricted 
Weiss- Weinstein  bounds  for  the  same  scenario  used  for  Figure  5-20. 
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Figure  5-21:  Summed  eigenvalues  of  restricted  Weiss- Weinstein  and  random  Cramer-Rao 
bound  for  initial  condition  estimators  of  time-sampled  Lorenz  flow  with  observation  set 
Y(0,N).  (a)  Input  SNR  =  30  dB;  (b)  Input  SNR  =  20  dB. 


A  single  test  offset  was  used  in  the  Weiss- Weinstein  bound,  with  the  offset  carefully 
chosen  by  trial-and-error  to  yield  as  tight  a  bound  as  possible.  Additional  experiments  with 
more  than  one  test  offset  offered  little  if  any  improvement.  As  indicated  by  the  figures,  the 
undesirable  behavior  exhibited  by  the  random  Cramer-Rao  bound  is  also  exhibited  by  the 
restricted  Weiss- Weinstein  bound,  in  the  sense  that  there  is  a  threshold  value  of  N  above 
which  the  bound  on  the  MSE  is  smaller  when  there  is  greater  a  priori  uncertainty  in  the 
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initial  condition  cc(0).  As  such,  the  undesirable  behavior  of  the  random  Cramer-Rao  bound 
with  larger  input  SNRs  is  not  avoided  with  generalizations  of  the  bound. 

The  question  arises  as  to  the  existence  of  a  performance  bound  for  estimators  of  random 
parameters  which  avoids  the  undesirable  behavior  of  the  random  Cramer-Rao  bound.  It 
is  straightforward  to  derive  a  rate-distortion  bound  for  the  problem  of  interest  here  either 
by  specializing  the  more  general  bounds  introduced  in  [28,  92]  or  by  using  the  theory 
developed  in  [7]  as  a  foundation.  Experiments  have  suggested  that  the  resulting  bound 
avoids  the  undesirable  behavior  of  the  random  Cramer-Rao  bound  but  that  the  bound  is  a 
much  weaker  bound  than  the  Cramer-Rao  bound  except  with  extremely  small  input  SNRs 
and  thus  has  little  value  with  the  input  SNRs  of  interest  here.  Alternatively,  the  random 
Cramer-Rao  bound  does  not  need  to  be  incorporated  in  the  Weiss- Weinstein  bound  as  was 
done  for  the  results  presented  above.  However,  computer  experiments  have  indicated  that 
with  input  SNRs  above  20  dB,  the  Weiss- Weinstein  bound  is  an  extremely  weak  bound 
when  it  does  not  incorporate  the  random  Cramer-Rao  bound. 


Ill 
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Chapter  6 


MC  Maps  and  Signal  Synthesis 


6.1  Introduction 

Thus  far  in  this  thesis,  we  have  focused  on  dissipative,  chaotic  systems  and  state  estimation 
with  these  systems.  Various  topological  and  ergodic  properties  of  dissipative,  chaotic  sys¬ 
tems  have  been  considered,  including  the  presence  of  both  positive  and  negative  Lyapunov 
exponents,  boundedness  of  attractors,  and  the  existence  of  invariant  measures;  and  the  in¬ 
fluence  of  these  properties  on  achievable  state  estimator  performance  has  been  addressed. 
Until  now,  we  have  not  considered  properties  unique  to  individual  chaotic  systems,  nor  have 
we  discussed  how  one  might  synthesize  a  chaotic  signal  with  specified  properties. 

In  contrast  to  the  previous  chapters,  this  chapter  focuses  on  signal  synthesis,  in  particular 
the  synthesis  of  maps  with  properties  that  facilitate  the  detection  and  estimation  of  noise- 
corrupted  orbit  segments  generated  by  these  maps.  We  introduce  a  class  of  maps  that 
are  amenable  to  analysis  and  although  deterministic,  generate  potentially  useful  random 
processes.  In  the  next  chapter,  we  exploit  this  random-process-generation  property  of  these 
maps  to  derive  computationally  efficient,  practical,  optimal  and  suboptimal  detection  and 
estimation  algorithms  involving  these  maps,  and  we  briefly  speculate  on  the  use  of  these 
maps  and  detection  algorithms  for  secure  communication. 

Unlike  the  maps  considered  in  Chapters  4  and  5,  the  maps  considered  in  this  chapter 
and  the  next  are  neither  dissipative  nor  invertible,  and  the  attractors  are  not  fractals  but 
simple  subsets  of  TZ.  We  also  seldom  use  the  word  chaos  in  this  chapter,  thereby  avoiding 
the  difficulties  encountered  in  Chapter  2  in  properly  defining  the  word  and  in  determining 
whether  a  given  system  satisfies  a  given  definition.  However,  many  of  the  maps  considered  in 
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this  chapter  have  a  positive  Lyapunov  exponent,  bounded  orbits,  and  sensitive  dependence 
on  initial  conditions  and  would  thus  satisfy  many  definitions  of  chaos. 

Our  interest  in  the  maps  discussed  in  this  chapter  arose  in  an  effort  to  better  under¬ 
stand  the  behavior  of  certain  state  estimation  algorithms  involving  Kalman  filtering  and 
hidden  Markov  modeling  with  dissipative,  chaotic  maps.  However,  because  of  the  rich  set  of 
properties  these  simple,  one-dimensional  maps  exhibit,  the  maps  themselves  have  potential, 
practical  value  and  are  worthwhile  to  investigate  independently  of  dissipative  systems. 

The  chapter  begins  by  introducing  MC  maps,  a  class  of  piecewise  linear  maps  of  the  unit 
interval  onto  itself  which  give  rise  to  Markov  chains  and  continues  by  presenting  previously 
reported  and  newly  discovered  properties  of  MC  maps.  Finally,  the  chapter  discusses  the  use 
of  MC  maps  for  synthesizing  maps  having  specified  stationary  PDFs  and  for  synthesizing 
multidimensional  maps  which  also  give  rise  to  Markov  chains. 

A  notational  convention  used  in  the  chapter  and  the  next  is  that  for  real  numbers  x  and 
y  satisfying  x  <  y,  [x,  y]  and  ]x,y[  respectively  denote  the  closed  and  open  intervals  of  the 
real  line  R  with  x  and  y  as  endpoints.  Similarly,  ]x,  y]  and  [x,  y{  respectively  denote  left- 
open,  right-closed  and  left-closed,  right-open  subintervals  of  R.  Also,  A  denotes  Lebesgue 
measure,  whereas  p  denotes  an  arbitrary  measure. 


6.2  Markov  Maps  and  MC  Maps 

The  maps  of  principal  interest  in  this  chapter  and  the  next  comprise  a  subset  of  a  much 
larger  class  of  maps  of  intervals  of  the  real  line  to  themselves  known  as  Markov  maps.  The 
formal  definition  of  a  Markov  map  is  the  following: 

Markov  Map:  [12,  25]  A  piecewise  continuous  map  /  of  an  interval  I  —  [z’o,  in]  to  itself  for 
which  there  exists  a  set  of  points  P  =  {*i ,  *2,  •  •  • ,  in-i}  known  as  partition  points  satisfying 
*o  <  *i  <  ■  •  •  <  in-i  <  in  and  such  that  the  following  two  conditions  hold: 

1.  For  j  =  0, 1,  •  •  • ,  n  —  1,  the  restriction  of  /  to  the  open  subinterval  ]iy,  ij+i[  is  a  home- 
omorphism  (i.e.,  a  continuous,  invertible  mapping  with  continuous  inverse)  onto  an¬ 
other  subinterval  ]ik(j),  where  ik^  and  i^  are  elements  of  the  set  {io,  i\,  ■  ■  • ,  z„) 
and  4(j)  < 

2.  f(P )  C  P,  which  means  that  partition  points  are  mapped  to  partition  points. 
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3.  f(ij  ),/(*+)  €  P,  where  f(ij  )  =  lims_0 /(*;-£),  /(**)  =  ^ms-o  f(ij  +  6),  and  6  >  0. 
This  means  that  the  left  and  right  limits  of  /  evaluated  at  each  partition  point  are 
also  partition  points. 

Intuitively,  this  abstract  definition  means  that  a  Markov  map  is  a  piecewise  continuous 
map  of  an  interval  I  to  itself  for  which  one  can  find  a  partition  of  I  into  (nonoverlapping) 
subintervals  such  that  each  subinterval  is  mapped  nicely  onto  a  union  of  other  subintervals  in 
the  partition,  and  such  that  the  endpoints  of  these  subintervals  are  mapped  onto  endpoints 
of  other  subintervals  in  the  partition.  Implicit  in  the  above  definition  is  the  fact  that  /  gives 
rise  to  the  deterministic  state  equation  x(n  +  1)  =  f(x(n ))  where  x(n)  €  I ■  We  provide 
examples  of  Markov  maps  later  in  the  section. 

A  Markov  partition  is  any  finite  partition  of  the  interval  I  =  [z'o,  into  subintervals 
for  which  /  satisfies  the  three  conditions  for  a  Markov  map.  Since  partition  elements  are 
subintervals  of  R,  we  use  the  terms  partition  elements  and  subintervals  interchangeably  in 
this  chapter  to  refer  to  these  elements.  A  Markov  map  may  have  many  Markov  partitions, 
and  in  general  it  is  difficult  both  to  determine  if  a  given  map  is  a  Markov  map  and  to  find 
a  Markov  partition  for  a  given  Markov  map.  An  important  consideration,  which  underlies 
the  detection  algorithms  introduced  later  in  the  chapter,  is  that  certain  Markov  maps  have 
an  infinite  set  of  Markov  partitions  that  are  straightforward  to  determine. 

Of  particular  interest  in  this  chapter  is  a  small  subset  of  Markov  maps  for  which  each 
element  /  of  the  subset  satisfies  the  following,  additional  constraint: 

•  /  is  piecewise  linear  and  there  exists  a  Markov  partition  for  which  /  is  affine  on  each 
partition  element.  A  piecewise  linear  map  of  an  interval  I  to  itself  is  a  map  that  is 
affine  on  each  subinterval  in  a  set  of  subintervals  partitioning  I.  An  affine  map  is  a 
map  of  the  form  f(x)  =  r  x  +  /?,  where  r  and  j5  are  real- valued  constants. 

We  use  the  term  MC  map  to  denote  an  element  of  this  subset  of  Markov  maps,  because 
as  shown  later  in  the  section,  such  a  map  gives  rise  to  homogeneous,  finite-state  Markov 
chains.  Also,  we  use  the  term  EMC  map  to  denote  an  MC  map  which  satisfies  the  following 
additional  constraint 

•  The  map  is  eventually  locally  expanding  in  the  sense  that  there  exists  an  integer  n 
such  that  |I?{/n(2)}|  >  1  at  all  differentiable  points  of  fn. 
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Because  it  satisfies  this  additional  constraint,  an  EMC  map  has  a  positive  Lyapunov  expo¬ 
nent  and  in  general  exhibits  sensitive  dependence  on  initial  conditions.  A  fact  we  exploit 
later  in  the  chapter  is  that  this  additional  constraint  is  satisfied  by  any  piecewise  linear  map 
for  which  the  slope  of  each  affine  segment  is  an  integer  with  absolute  value  greater  than 
one. 

Figures  6-1  (a)  and  (b)  depict  two  EMC  maps  of  the  unit  interval.  Since  x{n  +  1)  = 


Figure  6-1:  Two  EMC  maps 

f(x(n)),  the  figures  depict  the  function  /  defining  the  map.  Figure  6-2  (a)  depicts  an  orbit 
segment  {x(i)}^0  generated  by  the  map  in  Figure  6-1  (a),  and  Figure  6-2  (b)  depicts  an 
orbit  segment  generated  by  the  map  in  Figure  6-1  (b). 


TIME  (n) 

Figure  6-2:  Typical  orbit  segments  for  respective  maps  in  Figures  6-1  (a)  and  (b) 


A  Markov  partition  for  the  map  shown  in  Figure  6- 1  (a)  is  given  by  any  division  of  the 
unit  interval  into  2 N  equal  length  subintervals,  where  N  is  a  positive  integer.  Similarly,  a 
Markov  partition  for  the  map  shown  in  Figure  6-1  (a)  is  given  by  any  division  of  the  unit 
interval  into  41V  equal  length  subintervals.  For  example,  the  partition  given  by  the  four 


equal  length  subintervals  {[0,  .25[,  [.25,  .5[,  [.5,  .75[,  [.75, 1]}  is  a  Markov  partition  for  each 
map. 

A  useful  property  of  MC  maps,  which  we  exploit  later  in  this  chapter,  is  that  under 
certain  conditions  on  the  distribution  of  the  initial  condition  x(0),  the  maps  give  rise  to 
Markov  chains.  As  shown  in  [12,  61],  a  Markov  map  /  of  the  unit  interval  gives  rise  to  a 
Markov  chain  if  the  initial  condition  has  a  constant  PDF  over  the  unit  interval  and  there 
exists  a  Markov  partition  P  =  {Ij}  satisfying  the  following  two  constraints: 

1.  f(x)  =  Tjk  x  +  f3jk,  for  x  €  Ijk  and  real  constants  Tjk,  f3jk 

2.  f(Ijk)  =  h  almost  everywhere  if  Ijk  0 

where  Ijk  denotes  the  points  of  Ij  which  are  mapped  to  Ik  by  /.  By  definition,  an  MC  map 
satisfies  these  two  constraints  and  thus  gives  rise  to  a  Markov  chain  if  the  initial  condition  is 
appropriately  chosen.  As  we  show  later,  the  constraint  on  the  PDF  of  the  initial  condition 
can  be  relaxed.  We  also  show  that  many  other  Markov  maps  both  one-dimensional  and 
multidimensional,  which  do  not  satisfy  the  above  constraints,  give  rise  to  Markov  chains  as 
well. 

For  a  given  MC  map  defined  on  the  unit  interval  [0, 1]  and  a  given  Markov  partition 
{Ij}  for  which  the  above  two  constraints  are  satisfied,  each  partition  element  Ij  corresponds 
to  a  state  Sj  of  a  Markov  chain.  Since  the  restriction  of  /  to  Ij  is  an  affine  transformation, 
then  Ijk  is  either  a  subinterval  of  Ij  or  the  empty  set.  In  fact,  a  readily  verified  relation 
we  use  later  in  the  chapter  is  the  following:  Ijk  =  Ij  D  f~l(Ik)-  The  transition  probability 
from  state  Sj  to  state  Sk  equals  the  fraction  of  points  in  Ij  that  are  mapped  to  Ik,  which 
is  given  by  X(Ijk)/X(Ij)  where  X  is  Lebesque  measure. 

For  the  Markov  maps  shown  in  Figure  6-1  (a)  and  (b),  with  the  Markov  partition  given 
by  {[0,  .25[,  [.25,  .5[,  [.5,  .75[,  [.75, 1]},  the  matrices  of  state  transition  probabilities,  hereafter 
referred  to  as  transition  probability  matrices  (TPMs),  are  the  following,  respectively: 


.5  .5  0  0 

0  0  .5  .5 

.5  .5  0  0 

0  0  .5  .5 


.25 

.25 

.25 

.25 

.25 

.25 

.25 

.25 

.25 

.25 

.25 

.25 

.25 

.25 

.25 

.25 

(6.1) 


The  dynamics  of  the  Markov  chain  arise  as  follows.  Let  {/j}  denote  the  elements  of  a 
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Markov  partition  for  /,  and  let  { Sj }  denote  the  states  of  the  Markov  chain  corresponding 
to  this  partition,  where  state  Sj  is  associated  with  partition  element  Ij.  Also,  let  x(n) 
denote  the  state  or  orbit  point  of  /  at  time  n,  i.e.,  x{n)  =  fn(x( 0)).  To  avoid  confusion  or 
ambiguity,  we  henceforth  use  the  word  state  only  in  reference  to  Markov  chains.  If  x(n)  G  Ij, 
the  Markov  chain  is  said  to  be  in  state  Sj  at  time  n.  If  the  initial  condition  is  a  random 
variable  with  appropriate  distribution  (e.g.,  constant  PDF),  the  state  sequence  that  arises 
as  a  result  of  this  mapping  between  orbit  points  and  states  is  a  first-order  Markov  process 
with  transition  probabilities  defined  as  above. 

More  formally,  consider  the  random  sequence  {5(n)}  where  S(n)  denotes  the  state 
corresponding  to  the  partition  element  in  which  x(n)  lies  (i.e.,  5(n)  =  Sj  if  x(n)  €  Ij). 
Also,  let  P(5(0))  denote  the  initial  state  distribution  given  by  P(S(Q)  =  j)  =  A  (Ij),  and  let 
P(5(  0),  5(1),  •  •  •,  5(A))  denote  the  joint  probability  of  5(0),  5(1),  •  •  •,  5(A).  The  condition 
on  P(5(0))  implies  that  the  initial  condition  x(0)  is  a  random  variable  with  constant  PDF 
over  the  unit  interval.  Under  these  conditions,  the  following  Markov  property  holds: 

P(5(0),  5(1),  •  •  • ,  5(A))  =  P(5(0))  P(5(l)|5(0))  •  •  -P(5(A)|5( A  -  1))  (6.2) 

where 

P«5(i)  =  fc]|[S(i  -  1)  =  j])  =  ^  (6.3) 

In  the  next  chapter,  we  exploit  the  relation  between  MC  maps  and  Markov  chains  to  derive 
algorithms  for  detecting  noise-corrupted  orbit  segments  generated  by  these  maps. 

EMC  maps  axe  among  the  few  maps  exhibiting  the  properties  associated  with  chaos 
that  are  also  amenable  to  analysis.  Analysis  of  these  maps  is  facilitated  by  their  piecewise 
linearity.  Nonetheless,  these  maps  have  many  interesting,  potentially  useful  properties  and 
may  be  useful  building  blocks  for  other  signals.  In  the  next  4  sections,  we  consider  properties 
of  EMC  maps  and  more  generally  MC  maps,  which  are  relevant  to  detection  and  estimation 
applications  involving  these  maps.  Among  the  properties  we  consider  is  the  relative  ease  in 
synthesizing  an  MC  map  with  integer-valued  slopes  for  the  affine  segments,  which  gives  rise 
to  any  specified  Markov  chain  having  a  specified  TPM  with  rational-valued  entries.  The 
significance  of  integer- valued  slopes  is  that  any  MC  map  having  this  property  gives  rise  to 
arbitrarily  fine  Markov  partitions  with  equally  sized  partition  elements,  a  fact  we  prove 
later  in  the  chapter.  We  also  merge  previously  published  results  to  establish  a  close  relation 
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between  the  properties  of  EMC  maps  and  the  Markov  chains  they  give  rise  to.  In  particular, 
we  show  that  ergodicity  of  an  EMC  map  is  equivalent  to  irreducibility  of  the  TPMs  of  all 
Markov  chains  the  map  gives  rise  to  and  exactness  of  a  EMC  map  is  equivalent  to  primitivity 
of  the  TPMs  of  ail  Markov  chains  the  map  gives  rise  to.  We  also  show  that  the  Markov 
chain  property  of  EMC  maps  is  preserved  under  homeomorphisms,  a  result  which  allows 
the  synthesis  of  a  Markov  map  which  gives  rise  to  any  specified  Markov  chain  and  which 
has  any  specified  stationary  PDF.  Finally,  we  show  how  one  can  use  this  homeomorphism 
property  to  design  nontrivial  multidimensional  maps  which  give  rise  to  Markov  chains. 

In  the  next  section,  we  focus  on  the  relation  between  MC  maps  and  the  Markov  chains 
they  give  rise  to,  whereas  in  Section  6.4  we  deal  with  the  relation  between  these  maps 
and  their  associated  Markov  partitions.  Section  6.5  is  devoted  to  the  ergodic  properties 
of  EMC  maps,  and  Section  6.6  deals  with  the  synthesis  of  multidimensional  MC  maps. 
Various  theorems  and  propositions  are  presented,  with  proofs  provided  for  those  that  are 
new.  Several  of  the  proofs  are  informal,  with  the  essential  elements  of  the  formal  proofs 
provided  but  the  excessive,  unrevealing  details  omitted. 


6.3  EMC  Maps  and  Markov  Chains 

As  originally  shown  in  [43,  p.294:Theorem  5]  and  later  in  [46,  80],  given  the  transition 
probability  matrix  (TPM)  of  any  homogeneous,  finite-state  Markov  chain  and  any  vector 
of  nonzero,  initial  state  probabilities,  one  can  synthesize  a  piecewise  linear  map  of  the  unit 
interval  onto  itself  which  gives  rise  to  that  Markov  chain.  The  map  will  be  a  Markov  map, 
but  may  or  may  not  be  an  MC  map  depending  upon  the  specified  Markov  chain.  Thus, 
Markov  maps  are  generators  of  homogeneous,  finite-state  Markov  chains.  We  briefly  outline 
the  design  process,  primarily  to  clarify  the  relation  between  piecewise  linear  Markov  maps 
and  Markov  chains.  To  facilitate  an  understanding  of  the  design  process,  we  apply  the 
method  to  a  specific  example  while  outlining  the  process. 

Let  m  denote  the  number  of  states  of  the  desired  Markov  chain,  and  let  the  row  vector 
of  state  probabilities  at  time  n  be  denoted  17(n )  =  [xi(n),  7t2(tc)  ■  ■  • ,  7rm(n)],  where  Xj(n) 
denotes  the  probability  of  being  in  state  i  at  time  n.  With  this  notation,  11(0)  denotes  the 
vector  of  initial  state  probabilities.  Also,  let  the  m  x  m  TPM  be  denoted  P  =  [pti][j=i 
where  denotes  the  probability  of  transitioning  from  state  i  to  state  j  in  one  time  step. 
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For  our  example,  we  let  m  =  3,  17(0)  =  [.25,  .25,  .5],  and  we  let  P  be  given  by 


.5  .5  0 

0  0  1 

III 

3  3  3  . 


(6.4) 


Now  perform  the  following  sequence  of  steps: 

1.  Partition  the  unit  interval  into  m  consecutive  subintervals  {Ij}f-i,  where  Ij  has  length 
7Tj(0).  Let  ejj  and  ehT  respectively  denote  the  left  and  right  endpoints  of  Ij.  (Note 
that  ej,r  =  ej+i,;.)  For  our  example,  the  set  of  (closed)  subintervals  [ej,i,ejir]  is  given 
by  {[0,.25], [-25,  .5],  [.5,1]}. 

2.  Partition  each  subinterval  Ij  into  m  consecutive  subintervals  {Ijk}^=i  where  Ijk  has 
length  "K j  (0)  Pj k  =  HIj)Pjk,  which  is  the  product  of  the  transition  probability  from 
state  j  to  state  k  and  the  length  of  subinterval  Ij.  Also  let  ejk,i  and  ejk,T  respectively 
denote  the  left  and  right  endpoints  of  Ijk ■  (  Note  that  if  pjk  =  0,  the  subinterval  Ijk 
has  zero  length.  Zero  length  subintervals  are  included  here  only  as  a  notational  aid 
to  simplify  the  use  of  indices).  For  our  example,  the  sets  of  subintervals  \cjk,u  £jk,r] 
for  fixed  j  and  increasing  k  are  given  by 


{[ei-,/,ex.,r]} 

=  {[0,  .125],  [.125,  .25],  [.25,  .25]} 

(6.5) 

{h-,i)e2-/]} 

=  {[.25,  .25],  [.25,  .25],  [.25,  .5]} 

(6.6) 

{[e3-,he3.,r]} 

=  «-5' §4  §]4  ^ 

(6.7) 

3.  Now  consider  a  map  /  from  the  unit  interval  to  itself  as  a  function  in  the  (x,y)- 
plane  which  maps  each  point  xo  €  [0, 1]  to  some  point  yo  €  [0, 1].  Let  (x o,  yo)  denote 
the  relation  yo  =  /(x0).  For  each  subinterval  Ijk  with  nonzero  length,  draw  the 
line  segment  in  the  (x,  y)-plane  with  left  endpoint  given  by  the  (x,  y)  pair  ( Cjk,iiek,i ) 
and  right  endpoint  given  by  the  (x,  y)  pair  (e:k,r,ek,r)-  This  corresponds  to  a  linear 
mapping  of  the  subinterval  Ijk  (on  the  x-axis)  onto  the  subinterval  Ik  (on  the  i/-axis). 
Figure  6-3  depicts  the  resulting  piecewise  linear  map  for  our  example.  Special  care 
must  be  taken  at  each  discontinuity  point  to  ensure  that  the  point  maps  to  a  single 
point. 
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x(n) 

Figure  6-3:  Synthesized  MC  map. 


The  critical  aspects  of  the  synthesis  procedure  are  in  specifying  the  lengths  of  the  various 
subintervals  and  in  ensuring  that  each  subinterval  Ijk  maps  linearly  onto  Ik-  Note  that  since 
Ijk  has  length  Kj(0)pjk  =  A  (Ij)pjk  and  Ik  has  length  7Tfc(0)  =  A  (Ik),  it  follows  that  f'(Ijk), 
the  slope  of  the  affine  mapping  of  Ijk  onto  Ik,  is  given  by 


w/r  ^  _  **(0)  _  A (Ik) 

Jk  *j{0)Pjk  A  (Ij)pjk 


(6.8) 


The  synthesis  procedure  is  not  the  only  one  possible.  For  example,  the  substitution  of  the 
line  segment  with  left-right  endpoint  pairs  ( ejk,i,ek,r )  and  ( ejk,r,ek,i )  for  the  segment  with 
left-right  endpoint  pairs  (ejk,i,ek,i)  and  (ejk,T,ek,r)  yields  the  same  results. 

The  subintervals  {Ij}  are  the  m  states  of  the  Markov  chain.  The  state  sequence  arising 
with  almost  all  initial  conditions  x(0)  for  the  induced  dynamical  system  x(n  4- 1)  =  f(x(n)) 
is  a  sample  path  of  the  desired  Markov  chain,  provided  that  r(0)  is  a  random  variable  with 
constant  PDF  over  the  unit  interval.  As  discussed  earlier  in  this  thesis,  randomness  in  a 
deterministic,  chaotic  system  is  due  solely  to  randomness  in  the  initial  condition.  However, 
this  randomness  constraint  on  the  initial  condition  is  necessary  only  to  ensure  the  initial 
state  probabilities  have  the  desired  values.  Consider  the  situation  in  which  we  only  observe 
the  partition  element  in  which  x(n )  lies  at  each  time  n,  or  equivalently  we  know  the  state 
sequence  of  the  Markov  chain  and  nothing  else.  Then,  regardless  of  the  distribution  of 
the  initial  condition,  the  transitions  between  states  will  be  random  events  with  the  desired 
transition  probabilities.  In  other  words,  with  restricted  knowledge  about  the  orbit  point 
x(n)  at  each  time  n,  the  original,  deterministic  process  becomes  a  random  process  with  the 
desired  Markov  chain  structure. 
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Although  the  map  /  that  results  from  the  synthesis  procedure  is  an  MC  map,  the 
partition  given  by  the  set  of  intervals  {Ij}  is  not  a  Markov  partition  for  the  map  since  the 
restriction  of  /  to  each  of  these  intervals  is  not  an  affine  mapping,  but  instead  a  piecewise 
linear  mapping.  However,  the  finer  partition  given  by  the  set  of  subintervals  {Ijk}  is  in  fact 
a  Markov  partition  for  /. 

We  now  extend  the  relation  between  Markov  chains  and  MC  maps  in  two  ways,  with 
both  extensions  useful  for  the  detection  and  estimation  applications  considered  in  the  next 
chapter.  First,  we  have  the  following  result  that  simplifies  the  selection  of  Markov  partitions. 

Proposition  1:  Given  a  TPM  with  rational- valued  elements  and  a  vector  of  rational¬ 
valued,  nonzero  initial  state  probabilities  for  a  Markov  chain,  one  can  synthesize  an  MC  map 
and  if  desired  an  EMC  map  which  gives  rise  to  this  Markov  chain,  with  each  of  the  affine 
segments  of  the  map  having  integer-valued  slope. 

Proof:  (see  Appendix  B) 

Second,  the  procedure  discussed  earlier  for  synthesizing  Markov  maps  which  give  rise 
to  Markov  chains  requires  that  the  initial  condition  x(0)  have  a  constant  PDF  over  the 
unit  interval,  that  the  initial  state  probabilities  all  be  nonzero,  and  that  the  length  of  each 
interval  Ij  be  equal  to  the  initial  probability  of  the  corresponding  state.  Each  of  these 
requirements  can  be  relaxed  and  the  Markov  chain  property  still  hold,  as  indicated  by  the 
following  corollary. 

Corollary  1:  The  Markov  chain  property  of  the  Markov  map  designed  with  the  procedure 
discussed  earlier  still  holds  if  the  initial  condition  a;(0)  is  a  random  variable  with  constant 
PDF  over  each  interval  Ij,  but  the  constant  value  need  not  be  the  same  for  different  intervals. 
In  this  case,  the  initial  state  probability  for  Sj,  the  state  corresponding  to  interval  Ij,  is 
given  by  the  product  of  the  interval  length  A  (Ij)  and  the  constant  value  associated  with 
that  interval  by  the  PDF  of  the  initial  condition. 

Proof:  We  omit  a  formal  proof.  Instead,  we  note  that  the  proof  of  Proposition  1  still 
holds  with  only  minor  changes  when  the  PDF  of  the  initial  condition  is  constant  over  each 
subinterval  Ij.  The  state  transition  probabilities  remain  the  same  since  they  are  determined 
by  ratios  of  subinterval  lengths,  i.e.,  pjk  =  ^jjy-  (Note  that  this  corollary  may  in  part  be 
implied  by  Corollary  1.2  in  [61]). 

In  light  of  the  above  corollary,  we  are  free  to  arbitrarily  specify  the  lengths  of  the 
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intervals  corresponding  to  each  state  of  the  Markov  chain  provided  that  each  interval  has 
nonzero  length  and  the  distribution  of  x(0)  is  chosen  appropriately.  This  result  is  useful  if 
we  wish  to  cascade  two  MC  maps  in  the  sense  that  we  generate  an  orbit  segment  {z(0}£Lo 
with  one  map  and  then  use  the  final  orbit  point  x(N)  as  the  initial  condition  for  the  second 
map.  As  we  discuss  Section  6.5,  a  subclass  of  EMC  maps  have  unique,  stationary  PDFs 
that  are  constant  over  the  partition  elements  of  suitably  chosen  Markov  partitions.  As  a 
result,  if  the  two  EMC  maps  are  synthesized  appropriately,  the  state  sequence  arising  from 
the  second  map  with  initial  condition  x(N)  is  a  sample  path  of  a  Markov  chain  as  well. 


6.4  Markov  Partitions 

The  questions  arise  as  to  how  one  goes  about  choosing  Markov  partitions  for  MC  maps 
and  how  one  determines  if  a  piecewise  linear  map  is  a  Markov  map  or  an  MC  map.  The 
selection  of  Markov  partitions  for  MC  maps  is  an  important  consideration  when  these  maps 
are  used  for  detection  and  estimation  applications. 

Unfortunately,  there  are  no  comprehensive  answers  to  these  questions  but  instead  a  host 
of  partial  answers.  For  example,  one  necessary  condition  for  a  piecewise  linear  map  to  be  a 
Markov  map  is  that  the  endpoints  of  the  affine  segments  be  periodic  or  eventually  periodic 
points  of  the  map.  The  necessity  of  this  condition  is  a  consequence  of  the  fact  that  a  Markov 
map  must  have  a  Markov  partition  for  which  the  restriction  of  the  map  to  each  partition 
element  is  an  affine  transformation.  Thus,  the  endpoints  of  the  affine  segments  must  be 
partition  points  for  such  a  partition.  Since  partition  points  are  required  to  map  to  partition 
points  and  each  partition  is  required  to  have  a  finite  number  of  elements,  it  follows  that 
these  endpoints  must  be  periodic  or  eventually  periodic  points  of  the  map.  Otherwise,  there 
would  be  an  infinite  number  of  partition  points  and  concomitantly  an  infinite  number  of 
partition  elements. 

One  special  case  in  which  the  periodic  points  are  particularly  simple  to  find  is  when 
the  slope  of  each  affine  segment  is  integer- valued.  As  shown  in  [78],  in  this  special  case, 
all  rational  points  axe  periodic  or  eventually  periodic.  If  in  addition  the  endpoints  of  the 
domains  of  the  affine  segments  and  the  images  of  these  endpoints  are  rational- valued,  the 
following,  stronger  result  holds  as  well. 

Proposition  2:  For  any  piecewise  linear  map  /  of  the  unit  interval  to  itself  for  which 
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the  slope  of  each  affine  segment  is  integer- valued  and  the  endpoints  of  the  affine  segment 
domains  along  with  the  images  of  these  endpoints  are  rational- valued,  there  exists  a  uniform 
Markov  partition  of  the  map.  Furthermore,  any  uniform  refinement  of  such  a  partition  is 
also  a  Markov  partition.  By  a  uniform  Markov  partition,  we  mean  a  Markov  partition  for 
which  each  partition  element  has  the  same  length.  By  a  uniform  refinement  of  a  uniform 
Markov  partition,  we  mean  any  uniform  Markov  partition  for  which  the  set  of  partition 
points  include  those  of  the  original  uniform  partition. 

Proof:  (See  Appendix  B) 

Earlier  we  showed  that  one  can  synthesize  an  MC  map  with  integer- valued  slopes  that 
gives  rise  to  any  finite-state,  homogeneous  Markov  chain  with  rational-valued  transition 
probabilities.  As  indicated  by  the  above  proposition,  for  these  maps  one  can  always  find  a 
sequence  of  increasingly  finer,  uniform  Markov  partitions. 

When  given  both  an  MC  map  /  for  which  the  slopes  of  the  affine  segments  are  not  all 
integer-valued  and  a  Markov  partition  for  the  map,  the  question  arises  as  to  how  one  goes 
about  finding  Markov  partitions  that  are  refinements  of  this  partition.  A  simple  approach  is 
to  use  successive  inverse  images  of  the  original  partition  points  as  additional  partition  points. 
These  additional  points  get  mapped  by  /  or  compositions  of  /  to  the  original  partition 
points,  and  thus  satisfy  the  requirement  that  partition  points  be  mapped  to  partition  points. 
This  approach  was  used  in  [14,  46]  to  approximate  arbitrary,  monotonically  expanding,  one¬ 
dimensional  maps  by  piecewise  linear  Markov  maps. 

6.5  Ergodic  Properties  of  EMC  Maps 

As  one  might  expect,  there  is  a  strong  relation  between  the  ergodic  properties  of  MC  maps 
and  the  Markov  chains  which  they  give  rise  to.  In  this  section,  we  explore  this  relation 
and  in  particular  show  that  ergodicity  of  an  EMC  map  is  equivalent  to  irreducibility  of 
the  TPMs  of  the  Markov  chains  the  map  gives  rise  to,  and  exactness  of  a  EMC  map  is 
equivalent  to  primitivity  of  the  TPMs  of  the  Markov  chains  the  map  gives  rise  to.  As  we 
discuss  later  in  the  section,  the  practical  relevance  of  stationary  PDFs  in  the  context  of 
EMC  maps  is  that  they  facilitate  the  synthesis  of  maps,  not  necessarily  EMC  maps,  which 
give  rise  to  any  specified  Markov  chain  and  and  any  stationary  PDF.  We  do  not  consider 
the  spectral  properties  of  EMC  maps,  a  topic  closely  related  to  that  of  stationary  PDFs. 
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The  reader  is  referred  to  [37]  for  an  exposition  on  this  topic. 


6.5.1  Ergodic  Theory  Fundamentals 

Before  discussing  the  ergodic  properties  of  EMC  maps,  we  briefly  review  several  relevant 
concepts  from  ergodic  theory  not  considered  in  Chapter  2.  Much  of  the  information  is 
extracted  from  [50,  55,  87]  to  which  the  reader  is  referred  for  additional  details. 

As  in  Section  2.3.2,  we  consider  a  measure  space  (X,/3 ,p)  where  X  is  a  set,  /3  is  a 
<r- algebra  of  subsets  of  X,  and  p  is  a  measure  on  8.  Of  special  interest  is  when  X  is  the 
unit  interval  I  (or  an  arbitrary  interval  on  the  real  line)  and  8  is  the  Borel  cr-algebra  on 
/,  which  is  defined  as  the  smallest  a-algebra  that  includes  the  open  intervals  on  the  real 
line  intersected  with  I.  Also,  of  special  interest  is  when  p  is  either  Lebesgue  measure, 
or  a  measure  that  is  absolutely  continuous  with  respect  to  Lebesgue  measure,  which  as  a 
consequence  has  a  corresponding  probability  density  function  (PDF).  In  other  words,  we 
are  principally  interested  in  very  simple  measure  spaces  with  probability  measures  having 
corresponding  PDFs. 

As  noted  in  our  earlier  discussion  of  ergodic  properties,  given  a  measure  space  (X, /?,//), 
a  transformation  /  :  X  — *■  X  is  measurable  if  f~1(B)  €  8  for  every  B  (E  8-  This  means 
that  the  inverse  image  of  every  element  of  8  is  also  in  8-  A  measurable  transformation 
is  nonsingular  if  p(f~1(B))  =  0  for  every  B  €  8  for  which  p{B)  =  0.  A  measurable 
transformation  is  measure-preserving  if  p(f~l(B))  =  p(B);  p  is  then  said  to  be  an  invariant 
measure  of  /.  Our  principal  interest  is  in  measure-preserving  transformations  for  which  the 
invariant  measure  has  a  corresponding  PDF. 

Given  a  measure  space  (X,8,p),  let  Lx{p)  denote  the  set  of  all  absolutely  integrable 
functions  on  ( X ,  /?,  p),  where  an  absolutely  integrable  function  g  on  (X,  8,  p)  is  a  real- valued 
function  (i.e.,  g  :  X  —*■  TZ)  that  is  measurable  on  (X,  /?,  p)  and  satisfies 

J  \g(x)\dp(x)  <  oo.  (6.9) 

Each  h  €  Ll{p)  which  satisfies  h(x)  >  0,  Vx  €  X  and  /  h(x)  dp(x )  =  1  is  known  as  a  density 
function  on  (X, /3,/x).  When  p  is  Lebesgue  measure  A,  such  a  function  is  simply  a  PDF 
familiar  to  non- mathematicians.  Note  that  each  density  function  h  induces  a  probability 
measure  ph  defined  by  Ph{B)  =  JB  h(x)  dp(x)  for  each  B  €  /?. 
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For  each  nonsingular  transformation  f  :  X  ->  X  on  (X,/3,/z),  there  is  a  unique  operator 
Pf  :  Ll(p)  —*  Ll{p)  known  as  the  Frobenius- Perron  operator  which  satisfies 


/  pf(p(x))  dfi(x)  =  f  p(x)  dp{x)  (6.10) 

for  each  B  €  ft  and  p  €  Ll(p)  [50,  55].  This  rather  abstract  definition  of  the  Frobenius- 
Perron  operator  has  a  simple,  intuitive  interpretation  when  X  is  the  real  line  TZ,  (3  is 
the  Borel  cr-algebra,  p  is  Lebesgue  measure,  and  p(x)  is  a  PDF.  With  these  restrictions, 
Pf(p(x))  is  the  PDF  induced  by  the  transformation  /.  In  other  words,  Pf(p(x))  is  the 
PDF  which  when  integrated  over  the  set  f(B)  for  each  B  €  /?,  gives  the  same  value  as 
is  obtained  by  integrating  p(x)  over  B.  If  /  is  invertible  and  differentiable  and  we  let 
y  =  f(x),  the  defining  expression  for  Pf(p(x))  reduces  to  the  following  expression  found  in 
many  probability  textbooks  (e.g.,  [68,  p.  118])  for  the  PDF  py{y)  induced  by  /: 


Pr(y ) = 


p(f  1(y)) 

I  fV-Kv))\' 


(6.11) 


where  f(f~1(y))  denotes  the  derivative  of  /  evaluated  at 

As  one  might  expect,  Pjn  =  PJ  which  means  that  the  Frobenius-Perron  operator  for 
the  n-fold  composition  of  the  transformation  /  is  the  same  as  the  ra-fold  composition  of  the 
Frobenius-Perron  operator  for  /.  As  such,  the  Frobenius-Perron  operator  is  the  discrete¬ 
time  Fokker-Planck  operator  [38,  50]  for  the  special  case  of  a  deterministic  state  equation. 

A  density  function  that  is  a  fixed  point  of  P ,  i.e.,  a  density  function  p(x)  which  satisfies 
Pf(p(x))  =  p(x),  is  called  a  stationary  density  of  /  since  the  density  function  induced  by  / 
is  the  same  as  the  original  density  function.  It  follows  directly  from  the  defining  equation 
for  the  Frobenius-Perron  operator  that  if  p(x)  is  a  stationary  density  of  /,  and  pp  denotes 
the  measure  induced  by  p(x),  i.e.,  pp(B)  =  fBp(x )  dx  for  each  B  €  ,5,  then  /  is  a  measure- 
preserving  transformation  on  the  probability  space  (X,(3,pp). 

As  discussed  in  Chapter  2,  a  measure-preserving  transformation  /  (on  (A,  /3,  p))  is 
ergodic  if  the  only  invariant,  measurable  sets  (i.e,  sets  B  €  /?  satisfying  f~1(B)  =  B)  have 
measure  0  or  1.  In  some  references  [50,  55],  the  definition  of  ergodicity  does  not  require  that 
/  be  measure-preserving,  only  that  it  be  nonsingular.  However,  most  properties  generally 
associated  with  an  ergodic  transformation,  such  as  the  equivalence  of  time  averages  and 
ensemble  averages,  are  valid  only  when  the  transformation  is  measure-preserving. 
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Our  interest  in  ergodic  transformations  in  this  chapter  differs  from  that  in  earlier  chap¬ 
ters  in  which  we  exploited  the  ergodicity  of  dissipative,  chaotic  maps  in  a  heuristic  MMSE 
state  estimator.  Of  interest  in  this  chapter  is  the  fact  that  an  ergodic  transformation  has  at 
most  one  stationary  PDF  of  the  Frobenius- Perron  operator  Pj.  In  particular,  if  a  transfor¬ 
mation  /  is  measure-preserving  and  ergodic  on  (X,(3,p)  and  p  has  a  corresponding  density 
function  p(x),  then  p(x)  is  the  unique  stationary  density  function  of  P/.  We  exploit  this 
property  of  ergodic,  measure-preserving  transformations  later  in  the  chapter  to  synthesize 
signals  which  give  rise  to  random  variables  with  specified  PDFs. 

An  equivalent  condition  for  ergodicity  of  a  measure- preserving  transformation  /  is  that 
the  following  condition  hold  for  all  sets  A,B  G  (3: 

~  /“*(£))  =  /z( A)  ^(P)  (6.12) 

i=0 

which  means  that  on  average  A  and  are  independent  of  each  other.  A  related  but 

stronger  property  than  ergodicity,  not  used  in  this  chapter,  is  that  of  being  strong  mixing. 
A  measure-preserving  transformation  /  on  the  probability  space  (X,/3,p)  is  strong  mixing 
if  the  following  condition  holds  for  all  sets  A,  B  £  (5: 

tog 0KAnf~'(B))  =  p(A)p(B)  (6.13) 

which  means  that  A  and  f~*(B)  asymptotically  become  independent  of  each  other. 

Finally,  an  even  stronger  concept  than  strong  mixing  is  exactness.  A  measure-preserving 
transformation  T  on  the  probability  space  (X,  /?,  p)  is  exact  if  the  following  condition  holds 
for  all  sets  B  €  j3  with  p(B)  >  0: 


hm  p(fn(B))=l  (6.14) 

Essentially,  this  condition  means  that  the  successive  images  under  /  of  each  measurable  set, 
even  those  of  arbitrarily  small  measure,  expand  until  they  cover  almost  all  of  X. 

As  discussed  in  [50],  there  is  a  strong  relation  among  ergodicity,  mixing,  and  exactness. 
In  particular,  an  exact  transformation  is  mixing,  and  a  mixing  transformation  is  ergodic. 
Therefore,  if  /  is  an  exact  transformation  on  (X,/3,p)  where  p  has  corresponding  density 
function  p(x),.  then  p(x)  is  the  unique  stationary  density  function  of  /.  In  addition,  given 
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any  initial  density  function  p0(x )  on  X,  Pf(p0(x))  converges  (in  norm)  as  n  goes  to  infinity 
to  p(x').  Therefore,  not  only  does  an  exact  transformation  have  a  unique  stationary  den¬ 
sity  function,  but  all  density  functions  converge  to  this  density  function  under  successive 
applications  of  the  Frobenius- Perron  operator  . 

Although  it  is  often  difficult  to  determine  whether  a  given  system  is  ergodic,  mixing,  or 
exact,  the  definitions  of  these  properties  are  straightforward  and  the  properties  exhibited 
by  systems  with  these  properties  are  well-understood.  Furthermore,  for  the  systems  of 
interest  in  this  chapter,  in  particular  MC  maps,  the  underlying  measure  space  is  extremely 
simple.  This  situation  contrasts  markedly  with  the  one  considered  earlier  in  the  thesis,  where 
defining  the  nebulous  concept  of  chaos  and  defining  an  appropriate  topology  or  measure 
space  on  the  attractors  of  dissipative  systems  suspected  of  being  chaotic  were  formidable 
tasks. 

We  now  consider  the  relation  among  the  ergodic  concepts  defined  above  and  the  topo¬ 
logical  property — sensitive  dependence  on  initial  conditions — most  often  associated  with 
chaos.  For  convenience,  we  consider  the  measure  space  ( X,f3,pi )  with  X  denoting  the  unit 
interval,  /?  denoting  the  Borel  cr-algebra,  and  pi  denoting  Lebesgue  measure;  and  we  use  the 
standard  topology  on  the  unit  interval  with  basis  given  by  the  intersection  of  open  intervals 
with  the  unit  interval.  Our  first  observation  is  that  an  ergodic  system  need  not  have  sen¬ 
sitive  dependence  on  initial  conditions.  For  example,  the  rotation  map  with  state  equation 
given  by 

x(n  +  1)  =  f(x(n))  =  x(n)  +  a  (modi)  (6.15) 

is  ergodic  if  the  constant  a  is  irrational  [87].  However,  the  map  does  not  have  sensitive 
dependence  on  initial  conditions  since  distances  between  points  on  the  unit  interval  are 
preserved  by  /,  except  for  a  wrap-around  effect  at  the  endpoints. 

However,  exactness  of  a  map  /  implies  sensitive  dependence  on  initial  conditions.  This 
follows  almost  directly  from  the  definition  of  exactness.  Specifically,  given  any  point  x 
on  the  unit  interval  and  any  subinterval  Jx  containing  x,  lim^^  pi(fn(Jx))  =  1  by  def¬ 
inition  of  exactness.  Therefore,  for  any  6  €  [0,1],  there  exists  an  integer  N(x,6 )  such 
that  pi(fN(x,s\Jx))  >  1  —  6.  As  such,  there  exists  a  point  y  €  Jx  for  which  \fN^x's\y)  — 
fN^x's\x)\  >  1=*. 

In  contrast,  a  map  is  not  necessarily  exact  even  if  it  is  both  ergodic  and  has  sensitive 
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dependence  on  initial  conditions.  The  map  g(x )  shown  in  Figure  6-4  (a)  is  ergodic  and 
has  sensitive  dependence  on  initial  conditions.  We  do  not  formally  prove  these  claims  here, 


Figure  6-4:  (a)  Map  g  having  sensitive  dependence  on  initial  conditions  that  is  ergodic  but 
not  exact;  (b)  g 3 


but  sketch  one  approach  for  proving  them.  One  can  verify  both  claims  by  considering  the 
composed  map  g3  shown  in  Figure  6-4  (b).  As  indicated  by  the  figure,  g3  consists  of  three 
separate  shift  maps,  one  acting  on  each  of  the  subintervals  [0,l/3[,  [l/3,2/3[,  and  [2/3,1]. 
Each  of  these  subintervals  is  invariant  under  g3.  Ergodicity  is  established  by  using  the  facts 
that  the  shift  map  is  ergodic  (as  shown  in  [50,  87])  and  that  g  applied  to  each  of  the  three 
subintervals  [0,l/3[,  [l/3,2/3[,  and  [2/3,1]  consists  of  either  a  translation  or  a  shift  followed 
by  a  translation.  With  these  facts,  it  is  straightforward  to  verify  the  equivalent  condition 
for  ergodicity  proven  in  [87],  which  requires  that  for  each  measurable  set  A  with  nonzero 
measure  the  following  condition  holds: 


m  \\Jrn(A)  Ui. 


<n=l 


(6.16) 


Sensitive  dependence  on  initial  conditions  follows  from  the  fact  that  the  shift  map  exhibits 
sensitive  dependence  on  initial  conditions  and  g3  consists  of  three  shift  maps,  one  applied 
to  each  of  the  three  invariant  sets  of  g3  given  by  [0,l/3[,  [l/3,2/3[,  and  [2/3,1]. 

Therefore,  the  map  shown  in  Figure  6-4  (a)  is  ergodic  and  has  sensitive  dependence 
on  initial  conditions.  However,  the  map  is  not  exact  as  it  is  straightforward  to  verify  that 
limn_oo  /■*(/”(</))  =  1/3  1  for  every  subinterval  J  C  [0,  l/3[. 

Concepts  similar  to  ergodicity  and  exactness  apply  to  the  transition  probability  matrices 
(TPMs)  of  homogeneous,  finite-state  Markov  chains.  In  particular,  let  P  =  [ptJ]  denote  the 
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TPM  of  a  homogeneous,  finite-state  Markov  chain,  where  pij  denotes  the  probability  of 
transitioning  from  state  i  to  state  j  in  one  time  step,  and  let  Pn  =  [p-]  denote  the  n  step 
TPM  of  the  Markov  chain,  where  p-  denotes  the  probability  of  transitioning  from  state  i 
to  state  j  in  exactly  n  time  steps.  Pn  is  the  nth  power  of  P,  but  in  general  p?-  is  not  the  nth 
power  of  pij.  The  TPM  is  irreducible  if  for  each  pair  of  indices  (i,  j),  there  exists  a  positive 
integer  n,y  such  that  p^'J  >  0.  Intuitively,  irreducibility  of  the  transition  matrix  means  that 
it  is  possible  to  eventually  get  from  each  state  of  the  Markov  chain  to  every  other  state.  A 
well-known  result  from  linear  algebra  [29,  49]  is  that  every  irreducible  matrix  has  a  unique, 
invariant  left  eigenvector.  It  follows  from  this  that  every  Markov  chain  with  an  irreducible 
TPM  has  a  unique,  invariant  state  probability  vector,  i.e.,  a  row  vector  of  probabilities  77 
that  satisfies  IIP  =  17.  A  Markov  chain  with  irreducible  TPM  is  also  ergodic  with  respect 
to  this  invariant  probability  vector. 

If  in  addition  to  being  irreducible,  the  TPM  has  the  property  that  there  is  a  single, 
positive  integer  m  such  that  p™  >  0  for  all  pairs  of  indices  (i,  j),  then  the  TPM  is  also 
primitive.  Intuitively,  a  Markov  chain  with  primitive  TPM  is  one  in  which  it  is  possible 
to  get  from  each  state  to  itself  and  to  each  other  state  in  exactly  the  same  number  of 
time  steps.  A  Markov  chain  with  primitive  TPM  has  the  property  that  any  initial  state 
probability  vector  converges  to  the  unique,  invariant  probability  vector  as  time  goes  to 
infinity. 


6.5.2  EMC  Maps,  Markov  Chains,  and  Stationary  PDFs1 

Unless  indicated  otherwise,  in  this  section  all  EMC  maps  are  assumed  to  be  maps  of  the  unit 
interval  to  itself.  By  appropriately  merging  results  from  [12,  25,  55,  87],  we  can  establish 
the  following,  strong  connection  between  EMC  maps  and  the  Markov  chains  which  they 
give  rise  to. 

Proposition  3:  a.  Ergodicity  of  an  EMC  map  for  which  each  subinterval  of  the  unit 
interval  has  nonzero  measure  is  equivalent  to  irreducibility  of  the  TPMs  of  the  Markov 
chains  the  map  gives  rise  to.  Thus,  an  ergodic  EMC  map  has  a  unique  stationary  PDF, 
and  each  Markov  chain  it  gives  rise  to  has  a  unique,  invariant  state  probability  vector. 

b.  Exactness  of  an  EMC  map  for  which  each  subinterval  of  the  unit  interval  has  nonzero 
1Some  of  the  work  in  this  section  was  performed  in  conjunction  with  S.  Isabelle  at  MIT 
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measure  is  equivalent  to  primitivity  of  the  TPMs  of  the  Markov  chains  the  map  gives  rise 
to.  Thus,  not  only  does  an  exact  EMC  map  have  a  unique  stationary  PDF,  but  given  any 
nonsingular  PDF  for  x(0),  the  PDF  of  x(n)  converges  as  n  goes  to  infinity  to  the  unique 
stationary  PDF.  Similarly,  each  Markov  chain  which  the  map  gives  rise  to  has  a  unique, 
invariant  state  probability  vector  which  all  other  initial  state  probability  vectors  converge 
to. 

Proof:  (See  Appendix  B) 

As  noted  in  [12]  and  used  in  the  proof  of  Proposition  3,  the  stationary  PDF  of  an 
ergodic  EMC  map  is  piecewise  constant.  A  simple  way  to  find  it  is  the  following.  First,  find 
a  Markov  partition  {/,}”_ 2  and  the  TPM  of  the  corresponding  Markov  chain.  Next,  find  the 
invariant  probability  vector  for  this  TPM.  Let  IT  =  [7ri,  •  •  • ,  7r„]  denote  this  vector,  where 
Wj  denotes  the  invariant  probability  associated  with  the  state  corresponding  to  subinterval 
Ij.  The  stationary  PDF  for  all  points  in  Ij  is  given  by  TTj/\(Ij). 

Although  EMC  maps  which  give  rise  to  Markov  chains  with  irreducible  TPMs  have 
piecewise  constant  stationary  PDFs,  one  can  not  independently  specify  this  PDF  and  a 
specific  Markov  chain  which  the  map  gives  rise  to.  However,  using  an  EMC  map  as  a 
building  block,  one  can  synthesize  a  Markov  map  (which  may  or  may  not  be  an  EMC  map) 
that  has  any  specified  stationary  PDF  and  which  also  gives  rise  to  a  Markov  chain  with 
any  specified  TPM.  To  do  so,  one  utilizes  the  following  relation,  established  in  [12],  among 
maps  with  stationary  PDFs  and  maps  derived  from  them  via  homeomorphisms: 

Proposition:[12]  For  any  EMC  map  /  which  gives  rise  to  a  Markov  chain  with  irreducible 
TPM  and  any  differentiable  homeomorphism  h  :  I  — *•  I,  the  transformation g  =  ho  f  o  h~x 
has  a  unique  stationary  PDF  pa(x),  which  is  given  by  Pg(x)  =  ffih- i(x))\'  w^ere  Pf(x)  is 
the  unique  stationary  PDF  of  /  and  h/(h-1(x))  denotes  the  derivative  of  h  evaluated  at 
h~1(x). 

The  above  proposition  is  essentially  a  restatement  of  the  relation  considered  earlier  (in  the 
discussion  of  the  Frobenius- Perron  operator  )  between  the  PDF  of  a  random  variable  and 
the  PDF  resulting  from  a  memoryless  transformation  on  that  random  variable.  In  this  case, 
the  random  variable  is  x(n)  with  PDF  pp,  h  is  the  transformation,  and  pa  is  the  PDF  of 
the  transformed  random  variable  h(x(n)). 

As  noted  in  [68,  p.  261]  and  in  other  probability  textbooks,  given  a  random  variable  v 
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with  PDF  pv  and  distribution  function  Fv  where 


/Vo 

pv(v)dv  (6-17) 

-OO 

one  can  create  a  random  variable  w  with  specified  PDF  pw  or  specified  distribution  function 
Fw(w o)  =  f™ZoPw( w)  dw  using  the  transformation  F^(Fv),  so  that 

w  =  Fw(Fv(v)).  (6.18) 

An  implicit  assumption  with  this  transformation  is  the  invertibility  of  Fw ,  which  is  always 
the  case  if  pw{w)  >  0  for  all  real  w  or  if  the  domain  of  Fw  is  restricted  to  those  w  for  which 
pw(w)  >  0. 

We  can  use  this  result  for  transforming  PDFs  and  the  earlier  proposition  to  synthesize 
a  Markov  map  that  has  any  specified  (well-behaved)  stationary  PDF  and  which  gives  rise 
to  a  Markov  chain  with  any  specified  irreducible  TPM.  The  procedure  is  as  follows.  First, 
synthesize  an  EMC  map  /  which  gives  rise  to  the  desired  Markov  chain.  From  Proposition  3, 
we  know  that  such  a  map  has  a  stationary  PDF  that  is  piecewise  constant  and  in  particular 
constant  over  each  element  of  any  Markov  partition.  Furthermore,  the  PDF  must  be  nonzero 
almost  everywhere  on  the  unit  interval.  Otherwise,  there  would  be  a  subinterval  of  zero 
density,  which  would  necessarily  correspond  to  a  partition  element  or  union  of  partition 
elements  in  any  Markov  partition,  since  the  stationary  PDF  is  constant  over  partition 
elements.  Consequently,  this  subinterval  of  zero  density  would  be  associated  with  at  least 
one  state  in  the  associated  Markov  chain.  The  invariant  probability  for  this  state  would 
necessarily  be  zero,  since  its  value  is  given  by  the  integral  of  the  stationary  PDF  over  the 
corresponding  subinterval.  However,  each  element  of  the  invariant  vector  of  an  irreducible 
matrix  must  be  nonzero,  by  convention.  Thus,  the  stationary  PDF  must  be  nonzero  almost 
everywhere. 

Let  pp  denote  the  stationary  PDF  of  /  and  Fp  the  corresponding  distribution  function. 
Let  pc  denote  the  specified,  stationary  PDF  and  Fq  the  corresponding  distribution  function. 
Now  let  h  =  Fq1(Ff)  in  the  above  proposition.  If  h  is  a  differentiable  homeomorphism,  then 
by  this  proposition  and  the  result  for  transforming  PDFs,  g  will  have  the  desired  stationary 
PDF.  Note  however  that  the  requirement  that  h  be  a  differentiable  homeomorphism  requires 
that  the  desired  PDF  pc  be  sufficiently  well-behaved  in  a  mathematical  sense. 


132 


In  addition,  g  gives  rise  to  the  same  Markov  chains  as  /.  An  informal  argument  as  to 
why  this  is  true  is  the  following.  Since  h  is  a  homeomorphism,  it  is  by  definition  continuous 
and  invertible.  Because  of  the  continuity  of  h,  each  subinterval  Ij  in  a  Markov  partition 
Mf  —  {/j}  of  /  is  mapped  by  h  to  a  subinterval.  Because  of  the  invertibility  of  h,  the 
image  under  h  of  any  two  different  partition  elements  do  not  intersect.  Therefore,  the  set  of 
subintervals  Mg  =  {h(Ij)}  forms  a  partition  for  g.  Also,  since  h  is  a  homeomorphism,  the 
partition  is  a  Markov  partition  since  g  acting  on  each  element  in  Mg  is  equivalent  to  /  acting 
on  the  corresponding  elment  in  Mf.  That  is,  if  /(//)  =  Ik  U  then  g(h(Ij))  —  h(Ik )  U  h(Ii) 
which  is  easily  verified: 


aWi))  = 


(hofoh-'mij)) 

(6.19) 

h(f(h(h-'(ij))))=h(f(ij)) 

(6.20) 

h(IkUl,)  =  h(Ik)Uh(h). 

(6.21) 

If  {5,}  denote  the  states  associated  with  the  partition  Mf  such  that  Si  is  associated  with 
and  if  we  associate  state  Si  with  /i(/t)  as  well,  then  state  sequences  that  arise  under  the 
dynamics  of  g  are  sample  paths  of  a  Markov  chain,  the  same  Markov  chain  as  for  /.  This 
follows  from  two  facts.  First,  Hf(Ij)  =  pg(h(Ij))  where  pf  and  pg  are  the  measures  induced 
by  pp  and  pa  respectively.  This  is  a  consequence  of  pa  being  the  density  induced  by  the 
transformation  h.  Second,  the  transition  probabilities  are  the  same.  This  is  a  consequence 
of  the  following  relation  which  we  verify  below: 


nVi)  vtWi))  1  ; 

where  Ijk  denotes  the  portion  of  Ij  mapped  to  Ik  by  /  and  h(Ij)k  denotes  the  portion  of 
h(Ij)  mapped  to  h(Ik)  by  g.  The  ratio  is  the  transition  probability  from  state  Sj  to 

Sk  for  map  /  and  is  the  transition  probability  from  state  Sj  to  Sk  for  map  g.  This 

is  a  generalization  (briefly  discussed  in  [61])  of  the  earlier  result  for  piecewise  linear  maps  of 
the  unit  interval  in  which  transition  probabilities  were  given  by  the  ratio  of  interval  lengths. 
For  the  EMC  map  /,  the  definitions  of  transition  probabilities  are  identical,  as  they  must 
be  since  pp  is  constant  over  partition  elements. 

To  verify  (6.22),  we  must  show  that  h(Ijk)  =  h(Ij)k.  To  do  so,  we  first  note  the  following 
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relations  which  are  straightforward  to  verify: 


ijk  =  i3nr\lk) 

(6.23) 

h(Ij)k  =  h(Ij)  (1 0-1(h(4)) 

(6.24) 

g~l  =  h  o  /-1  o  hr1 . 

(6.25) 

Substituting  (6.25)  in  (6.24)  yields 

h(Ij)k  =  h(Ij)  D(ho  /_1  o  h_1)(/i(/jt)) 

(6.26) 

=  h(Ij)nh(f~l(Ik)) 

(6.27) 

=  h(lj  n  /-1(4)) 

(6.28) 

=  h(Ijk) 

(6.29) 

where  the  next  to  last  equality  holds  because  h  is  a  homeomorphism. 

The  Markov  partitions  of  g  are  uniquely  determined  by  those  of  /  and  the  transformation 
h.  Whereas  the  map  /  might  give  rise  to  uniform  Markov  partitions,  in  general  this  is  not 
true  of  g.  For  example,  if  the  region  of  support  of  g  is  all  of  R  (e.g.,  pa  is  a  Gaussian  PDF), 
then  two  partition  elements  will  have  infinite  length  for  any  Markov  partition,  whereas  all 
others  will  have  finite  length. 

The  question  arises  as  to  the  relevance  of  unique,  stationary  PDFs  for  a  map.  The  answer 
is  that  such  a  PDF,  if  it  exists,  determines  the  concentration  of  orbit  points  on  the  real  line, 
for  orbits  generated  by  most  initial  conditions.  For  example,  Figures  6-5,  6-6,  and  6-7  depict 
EMC  maps  and  typical  orbit  segments  generated  by  each  map.  The  maps  shown  in  Figures 
6-6  (a)  and  6-7  (a)  were  both  derived  from  the  map  in  Figure  6-5  (a)  using  the  procedure 
outlined  above.  The  stationary  PDF  of  the  map  shown  in  Figure  6-5  (a)  is  constant  over 
the  unit  interval;  the  stationary  PDF  of  the  map  shown  in  Figure  6-6  (a)  equals  4  over  the 
subinterval  [.4,  .6[  and  .25  elsewhere;  and  the  stationary  PDF  of  the  map  shown  in  Figure  6-7 
(a)  equals  4  over  the  subinterval  [.0,  .2[  and  .25  elsewhere.  For  each  map,  the  orbit  segments 
reflect  the  stationary  PDFs,  with  orbit  points  concentrated  in  regions  where  the  probability 
density  is  large  in  Figures  6-6  (b)  and  6-7  (b).  The  relation  between  stationary  PDFs  and 
the  behavior  of  orbits  coupled  with  the  ability  to  independently  specify  the  stationary  PDF 
of  a  Markov  map  and  the  Markov  chains  it  gives  rise  to  may  have  practical  value,  such  as 
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x(n+1)  x(n+1) 


in  the  design  of  signals  for  communication  applications. 

6.6  Multidimensional  Markov  Maps 

It  is  straightforward  to  synthesize  multidimensional  Markov  maps  that  give  rise  to  Markov 
chains,  a  surprising  fact  in  light  of  the  difficulty  generally  encountered  in  finding  Markov 
partitions  for  maps  with  dimension  greater  than  one.  With  one-dimensional  maps,  partition 
elements  of  Markov  partitions  are  subintervals  or  one-dimensional  volume  elements,  and 
the  requirement  for  a  partition  to  be  a  Markov  partition  is  that  each  subinterval  in  the 
partition  maps  to  a  union  of  subintervals  in  the  partition.  Similarly,  with  m-dimensional 
maps,  partition  elements  for  Markov  partitions  are  m-dimensional  volume  elements,  and 
the  requirement  for  a  partition  to  be  a  Markov  partition  is  that  each  volume  element  in 
the  partition  maps  to  a  union  of  volume  elements  in  the  partition.  As  one  might  expect, 
synthesizing  maps  of  the  m-dimensional  unit  cube  for  which  one  can  find  volume  elements 
satisfying  this  requirement  is  in  general  a  daunting  task. 

One  set  of  piecewise  linear,  m-dimensional  maps  for  which  one  can  find  Markov  parti¬ 
tions  and  which  also  give  rise  to  Markov  chains  is  the  set  of  hyperbolic  toral  automorphisms 
[20].  However,  although  a  Markov  partition  is  known  to  exist  for  a  given  hyperbolic  toral 
automorphism,  finding  the  partition  elements  is  a  fairly  complex  procedure  even  in  two 
dimensions  [3,  10,  11]. 

We  now  discuss  techniques  for  synthesizing  m-dimensional  MC  maps  which  give  rise 
to  Markov  chains  by  using  one-dimensional  MC  maps  as  building  blocks.  The  simplest 
technique  is  to  start  with  m  one- dimensional  MC  maps  and  use  each,  or  more  precisely  the 
state  of  each,  as  a  component  in  an  m-dimensional  state  vector.  For  example,  if  z,(n  +  l)  = 
))  denotes  the  state  equation  for  the  ith  one- dimensional  MC  map,  then  the  state 
equation  for  a  two-dimensional  MC  map  is  given  by 

*i("+  1)  1  f  vs  f  /i(*i(»)) 

=  F(*(n))  = 

x2(n  +  1)  J  [  f2(x2(n)) 

It  is  straightforward  to  show  that  if  a  Markov  partition  for  f\  has  N\  elements  and 
a  Markov  partition  for  f2  has  N2  elements,  then  the  corresponding  Markov  partition  for 
F  has  N\N2  elements,  with  each  partition  element  a  rectangle  in  the  unit  square  with 


x(n  +  1)  = 


(6.30) 
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each  side  parallel  to  one  of  the  coordinate  axes.  Furthermore,  Markov  partitions  for  the 
multidimensional  map  also  give  rise  to  Markov  chains.  It  is  straightforward  to  show  that 
if  Iij  denotes  the  partition  element  formed  from  the  ith  partition  element  of  /a  and  the  jth 
partition  element  of  fo,  then  the  transition  probability  from  Iij  to  hi  is  given  by  Pikqji 
where  pik  is  the  transition  probability  from  the  ith  state  to  the  kth  state  of  /i  and  qji  is  the 
transition  probability  from  the  jth  state  to  the  Ith  state  of  /2- 

The  dynamics  of  multidimensional  MC  maps  synthesized  this  way  are  rather  trivial  since 
the  dynamics  along  each  component  are  independent  and  are  those  of  a  one-dimensional 
MC  map.  However,  by  relaxing  the  restriction  that  the  multidimensional  MC  map  be  a 
mapping  from  the  unit  m-cube  to  itself,  one  can  easily  extend  this  synthesis  approach 
to  create  multidimensional  MC  maps  with  nontrivial  dynamics.  To  do  so,  one  applies  a 
similarity  transformation  to  the  original  multidimensional  transformation.  For  example, 
let  A  denote  an  invertible,  m-dimensional  matrix  and  let  x(n  +  1)  =  F(x(n))  denote 
the  state  equation  of  an  m-dimensional  MC  map  synthesized  by  using  a  one- dimensional 
MC  map  for  each  component.  Then,  y(n )  =  Ax(n )  is  also  an  MC  map  with  state  equation 
y(n  +  1)  =  G(2/(n))  given  by 

y(n  +  1)  =  G{y(n ))  =  A.FA_1j/(n).  (6.31) 

An  informal  argument  as  to  why  G  is  an  MC  map  is  the  following.  Because  A  is  invertible, 
it  is  a  homeomorphism.  Therefore,  one  can  use  the  same  reasoning  as  used  in  the  earlier 
discussion  about  synthesizing  Markov  maps  with  specified  PDFs  to  argue  that  if  Mp  =  {J,j} 
is  a  Markov  partition  for  F  then  {A Iij}  is  a  Markov  partition  for  G.  Specifically,  because 
A  is  continuous,  each  element  of  Mf  is  mapped  by  A  to  a  connected  region  (a  parallelogram 
since  A  is  linear)  and  because  A  is  invertible  and  the  elements  of  Mp  are  disjoint,  the  images 
of  these  elements  are  disjoint  as  well.  Furthermore,  the  fraction  of  each  partition  element 
A  Iij  mapped  to  a  partition  element  A  hi  by  G  is  the  same  as  the  fraction  of  Iij  mapped 
to  hi  by  F,  because  A  is  a  homeomorphism.  Therefore,  the  state  transition  probabilities 
are  the  same  for  the  two  maps. 

As  noted  above,  since  each  partition  element  1^  is  a  rectangle  with  each  side  parallel 
to  a  coordinate  axis,  it  follows  that  each  partition  element  A  Iij  is  a  parallelogram  and  not 
necessarily  a  rectangle.  Also,  in  contrast  to  the  components  of  the  state  vector  of  F  for 
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which  each  is  the  state  of  a  one-dimensional  MC  map  with  dynamics  independent  of  the 
other  components,  the  components  of  the  state  vector  of  G  are  not  independent  in  general, 
and  each  does  not  correspond  to  the  state  of  a  one- dimensional  MC  map. 

Figure  6-8  depicts  4000  consecutive  orbit  points  generated  by  a  two-dimensional  MC  map 
F  synthesized  by  using  the  maps  shown  in  Figures  6-6  (a)  and  6-7  (a)  to  generate  each  com¬ 
ponent  of  the  state  vector.  One  component  of  the  state  vector  is  plotted  versus  the  other  in 


Figure  6-8:  Two-dimensional  EMC  map 


Figure  6-8.  Figure  6-9  depicts  4000  consecutive  orbit  points  generated  by  the  map  AFA  1 
where  A  is  given  by 


A  = 


1  2 
1  1 


(6.32) 


By  appropriately  selecting  the  one-dimensional  maps  for  the  components  of  F  and  the 
transformation  A,  one  can  generate  multidimensional  MC  maps  with  interesting  attractor 
patterns. 

We  can  generalize  this  result  by  letting  y(n )  =  H{x{n))  for  any  homeomorphism  H, 
which  leads  to  the  map  with  state  equation  y(n  +  1)  =  G(y(n ))  given  by 


y(n  +  1)  =  G(y(n))  =  H(F(H-\y(n)))).  (6.33) 


Using  similar  reasoning  as  used  in  the  preceding  section,  one  can  show  that  because  H 
is  a  homeomorphism,  the  image  of  each  (rectangular)  partition  element  Iij  under  H  is 
a  connected  region  in  the  range  of  H,  the  image  is  disjoint  from  the  images  of  all  other 
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0.6 


0.8 


1.2 


1.4 


1.6 


Figure  6-9:  Two-dimensional  EMC  map  derived  from  map  shown  in  Figure  6-8 

partition  elements,  and  the  set  is  a  Markov  partition  for  G.  One  can  also  show 

that  the  two  maps  give  rise  to  the  same  Markov  chains. 


6.7  Practical  Considerations 

The  discussion  thus  far  has  ignored  a  practical  problem  one  encounters  when  simulating 
certain  MC  maps  on  digital  computers.  The  problem  is  due  to  the  ability  of  digital  comput¬ 
ers  to  store  and  manipulate  only  dyadic  rationais.  As  noted  earlier,  every  rational  number 
on  the  unit  interval  is  an  eventually  periodic  point  of  any  MC  map  for  which  the  slope 
of  each  affine  segment  is  integer- valued.  As  such,  each  initial  condition  is  either  periodic 
or  eventually  periodic,  and  thus  no  orbits  are  dense,  exhibiting  the  randomlike  behavior 
expected  of  orbits  generated  by  such  maps. 

Although  one  might  argue  that  this  situation  arises  with  all  maps,  not  just  MC  maps, 
the  effects  are  more  pronounced  with  certain  MC  maps.  For  example,  consider  the  shift 
map  given  by  xn+1  =  2xn  (mod  1).  The  name  shift  map  arises  from  the  fact  that  the 
map  shifts  the  binary  representation  of  xn  left  one  place  and  retains  the  fractional  part 
of  the  result.  All  rational  points  on  the  unit  interval  are  periodic  or  eventually  periodic 
points  of  the  shift  map,  and  in  theory  all  irrational  points  have  dense  orbits.  However, 
since  a  computer  stores  only  dyadic  rationais  and  since  the  shift  map  shifts  the  binary 
representation  of  xn  left  one  place,  it  follows  that  the  computer-generated  orbit  for  the  shift 
map  with  any  initial  condition  xq  becomes  zero- valued  after  a  finite  amount  of  time. 
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Figure  6-10  depicts  the  computer-generated  orbit  corresponding  to  the  initial  condition 
Despite  the  fact  the  actual  initial  condition  is  irrational,  the  computer  approximates 


Figure  6-10:  Orbit  segment  for  shift  map  with  initial  condition 

the  initial  condition  with  a  dyadic  rational,  and  the  orbit  becomes  zero- valued  after  a  finite 
amount  of  time. 

There  are  several  practical,  though  not  totally  theoretically  justifiable  ways  to  circum¬ 
vent  this  undesirable  situation  when  simulating  MC  maps  such  as  the  shift  map  on  a  com¬ 
puter.  First,  for  each  affine  segment  with  slope  that  is  integer-valued  and  even,  one  can 
perturb  the  slope  slightly.  For  example,  one  might  replace  a  slope  of  2  with  a  slope  of 
1.9999999999.  Unfortunately,  the  substitution  changes  the  dynamics  of  the  map  as  well 
as  any  stationary  PDFs.  In  fact,  the  resulting  map  may  not  even  have  a  stationary  PDF 
even  though  the  original  map  is  ergodic.  Intuitively,  one  would  expect  the  dynamics  of  the 
new  map  to  be  close  to  those  of  the  original.  However,  it  is  difficult  if  not  impossible  to 
analytically  evaluate  the  effect  of  perturbed  slopes,  and  it  is  unclear  if  the  invariant  den¬ 
sity  or  dynamics  of  the  perturbed  map  converge  to  those  of  the  unperturbed  map  as  the 
perturbations  go  to  zero. 

Alternatively,  one  can  add  a  small  driving  noise  term  to  the  state  equation  of  an 
MC  map.  This  yields  the  following  non-deterministic  state  equation: 


Xn+l  =  f(xn)  +  Wn  (6.34) 

where  { wn }  is  a  white-noise  sequence.  If  the  PDF  of  each  noise  term  wn  is  constant  with 
region  of  support  over  [—a,  a]  for  some  constant  a,  then  as  shown  in  [13],  the  stationary 
PDF  of  the  driven  system  converges  in  i1-norm  to  that  of  the  undriven  system  as  the 
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driving  noise  sequences  converges  to  an  impulse  at  the  origin,  i.e.,  as  the  bounding  constant 
a  goes  to  zero.  Furthermore,  in  contrast  to  the  undriven  system  which  may  have  an  infinite 
set  of  periodic  points,  the  driven  system  has  no  periodic  points  and  as  a  result  the  orbit 
generated  by  each  initial  condition  exhibits  the  expected  randomlike  behavior. 

Similarly,  if  one  expresses  wn  as  een  where  e  is  a  small  positive  constant  and  {en}  is  an 
independent,  identically  distributed  random  sequence  with  nonzero  density  over  R,  then  the 
stationary  PDF  of  the  driven  system  converges  to  that  of  the  undriven  system  in  i1-norm 
as  e  -*  0,  as  a  consequence  of  a  more  general  result  given  by  [50,  p.289:Theorem  10.6.1]. 

A  third  technique,  the  one  used  for  the  examples  in  this  chapter,  consists  of  scaling  and 
then  rescaling  each  of  the  affine  mappings  comprising  the  MC  map.  In  particular,  the  affine 
map 

f(x)  =  Tx  +  fi  (6.35) 

is  replaced  by  the  theoretically  equivalent  affine  map 

/(*)  =  ~[(7  r)  x  +  (7  0)]  (6.36) 

7 

where  7  is  an  irrational  number  and  the  parenthesized  expressions  are  evaluated  first.  Fig¬ 
ures  6-11  (a)  and  (b)  show  the  first  100  orbit  points  and  orbit  points  1001  to  1100,  re¬ 
spectively,  for  the  same  map  and  initial  condition  used  for  Figure  6-10  but  with  the  affine 
transformations  comprising  the  map  scaled  and  rescaled  as  indicated  above  by  the  constant 
7  =  \/lT.  The  orbit  exhibits  the  randomlike  behavior  expected  with  this  map  and  initial 


Figure  6-11:  Orbit  segments  for  shift  map  with  irrational  initial  condition  and  affine  trans¬ 
formation  scaled  and  rescaled  by  7  =  vTl.  (a)  Orbit  points  1-100:  (b)  Orbit  points 
1001-1100. 
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condition. 

An  interesting  property  of  this  third  technique  is  that  orbit  segments  of  EMC  maps 
generated  using  it,  or  at  least  those  EMC  maps  for  which  the  slope  of  each  affine  segment  is 
an  integer  with  absolute  value  greater  than  one,  are  deterministic  both  in  a  theoretical  and 
in  a  certain  practical  sense.  In  particular,  given  an  EMC  map  /  and  an  (N  +  l)-point  orbit 
segment  {x(i)  =  /‘(:r(0))}£Lo  generated  with  this  technique,  the  backward  orbit  segment 
(y(O)ilo  obtained  by  defining  y(N)  =  x(N)  and  y(n )  =  f~l{y{n  +  1))  for  0  <  n  <  N  -  1, 
where  fn  denotes  the  affine  mapping  associated  with  x(n),  is  the  same  as  {a:(z)}£L0  to 
within  machine  precision.  In  other  words,  by  running  the  final  orbit  point  x(N )  through 
successive  compositions  of  the  inverse  system  f~l  implicitly  defined  by  the  orbit  segment, 
one  recovers  the  orbit  segment  to  within  machine  precision.  Figure  6-12  depicts  the  average, 
point- by-point,  squared  reconstruction  error  normalized  by  the  variance  of  the  original  orbit 
segment  {x(i)}£L0  of  the  shift  map,  with  the  original  orbit  segment  generated  using  the  third 
technique  discussed  above.  Figure  6-13  (b)  shows  analogous  results  for  the  EMC  map  shown 


LOG  N 

10 


Figure  6-12:  Normalized,  average,  point-by-point,  squared  reconstruction  error  (NRE)  with 
orbit  segments  of  length  N  +  1  for  shift  map. 

in  Figure  6-13  (a). 

As  suggested  by  the  figures,  in  contrast  to  orbit  segments  of  the  dissipative  maps  consid¬ 
ered  in  earlier  chapters,  an  orbit  segment  of  an  EMC  map  is  recoverable  from  the  final  orbit 
point  if  the  sequence  of  affine  segments  which  gave  rise  to  the  orbit  segment  is  known.  This 
invertibility  or  recoverability  property  of  EMC  maps  with  affine  segments  having  integer¬ 
valued  slopes  is  due  in  part  to  the  contractive  nature  of  the  inverse  system  f~x.  In  the 
next  chapter,  we  exploit  this  property  in  the  derivation  and  implementation  of  an  ML  state 
estimator  for  use  with  these  maps. 
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Chapter  7 


Detection  and  State  Estimation 
with  MC  Maps 


7.1  Introduction 


As  in  Chapter  6,  the  class  of  maps  of  interest  in  this  chapter  is  the  class  of  MC  maps, 
a  class  of  piecewise  linear  unit-interval  maps  for  which  the  members  give  rise  to  Markov 
chains.  However,  in  contrast  to  Chapter  6  which  focused  on  properties  of  these  maps 
and  synthesis  of  signals  using  these  maps  as  building  blocks,  this  chapter  focuses  on  the 
exploitation  of  these  properties  in  practical,  optimal  and  suboptimal  detection  and  state 
estimation  algorithms  for  use  with  these  maps.  We  first  consider  detection,  in  particular 
the  problem  of  discriminating  among  a  finite  number  of  EMC  maps  when  given  a  noise- 
corrupted  orbit  segment  generated  by  one  of  the  maps.  We  introduce  a  hidden- Markov- 
model  (HMM)  representation  of  this  discrimination  problem,  with  the  representation  being 
an  exact  one  if  the  orbit  points  are  properly  quantized  and  an  approximate  one  otherwise. 
We  then  exploit  this  representation  in  computationally  efficient,  optimal  and  suboptimal 
discrimination  algorithms,  and  we  assess  the  performance  of  the  algorithms.  We  also  exploit 
this  HMM  representation  in  iterative,  optimal  and  suboptimal  ML  estimation  algorithms 
for  estimating  both  an  unknown,  constant  scale  factor  applied  to  the  orbit  points  as  well 
as  the  variance  of  the  corrupting  noise.  We  conclude  our  discussion  of  detection  by  briefly 
considering  an  experimentally  determined  discriminability  metric  for  use  with  these  maps. 

Our  focus  then  shifts  to  state  estimation  with  MC  maps,  in  particular  the  problem  of 
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obtaining  the  ML  estimate  of  each  point  of  an  orbit  segment  generated  by  an  MC  map 
when  given  a  noise- corrupted  version  of  that  orbit  segment.  We  first  consider  the  issues  and 
complications  that  arise  when  attempting  to  perform  ML  state  estimation  with  MC  maps. 
Next,  we  introduce  an  HMM  representation  of  the  state  estimation  problem.  The  represen¬ 
tation  is  not  exact;  but  it  constitutes  a  fundamental  component  of  an  ML  state  estimator 
which  we  derive  and  show  to  be  optimal  if  the  representation  is  appropriately  chosen  and 
suboptimal,  but  nonetheless  potentially  effective,  otherwise. 


7.2  Detection  of  EMC  Maps 

7.2.1  Problem  Scenario 

The  close  relation  between  EMC  maps  and  Markov  chains  facilitates  the  detection  or  dis¬ 
crimination  of  these  maps  based  on  noise-corrupted  orbit  segments  generated  by  the  maps. 
In  this  section,  we  show  that  when  the  orbit  points  are  properly  quantized,  one  can  per¬ 
form  optimal,  computationally  efficient  discrimination  among  these  maps,  and  when  the 
outputs  are  not  quantized,  one  can  perform  computationally  efficient,  albeit  suboptimal 
discrimination. 

The  underlying  problem  scenario  we  consider  is  the  following.  We  are  given  M  one¬ 
dimensional  EMC  maps  {/i}^,  and  an  unobserved  (N  +  l)-point  orbit  segment  X  = 
{z(i)}£L0  is  generated  by  one  of  the  maps.  We  are  also  given  a  set  of  IV  +  1  observations 
Y  =  {t/(i)}£L0,  where  y(i)  is  given  by 

y(n)  =  hk(x(n))  +  v(n),  0  <  n  <  N.  (7.1) 

In  this  equation,  {v(i)}^L0  is  a  white-noise  sequence  which  is  assumed  to  be  independent  of 
the  initial  condition  x(0)  and  the  chosen  map  fk  and  for  which  the  variance  of  each  v(i)  is 
a2.  Also,  hk  is  a  memoryless  transformation  which  may  be  dependent  upon  the  map  fk-  In 
Section  7.2.2  we  consider  the  case  in  which  hk  is  a  quantizer.  In  Section  7.2.3  we  focus  on 
the  case  in  which  hk  is  the  identity  operator  but  briefly  consider  more  general  choices  of  hk 
as  well.  In  Section  7.4,  we  consider  ML  estimation  of  hk  when  it  is  an  unknown,  constant, 
scale  factor. 

This  scenario  gives  rise  to  several  related  problems.  The  problem  of  interest  here  is  an  M- 
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ary  hypothesis  testing  problem  in  which  one  seeks  to  determine,  ideally  in  an  optimal  way, 
which  of  the  M  maps  generated  the  unobserved  orbit  segment  X  and  thus  gave  rise  to  the 
observations  Y .  One  motivation  for  considering  this  problem  is  its  relevance  to  a  potential 
scheme  for  secure  M- ary  communication.  With  this  scheme,  an  EMC  map  is  associated  with 
each  of  the  M  signals.  To  transmit  a  signal,  one  generates  a  sufficiently  long  orbit  segment 
X  from  the  corresponding  map  and  transmits  this  segment  or  a  transformation  hk  of  the 
segment.  If  the  channel  is  an  independent,  additive  noise  channel,  the  task  at  the  receiver 
involves  M- ary  hypothesis  testing,  that  is,  determining  which  of  the  M  maps  generated  the 
received,  noise-corrupted  orbit  segment.  This  potential  communication  scheme  has  some 
similarity  with  those  techniques  for  spread-spectrum  communication  which  utilize  binary 
chipping  or  pseudo-random  noise  (PN)  sequences.  However,  in  contrast  to  these  techniques, 
the  scheme  does  not  require  the  receiver  to  be  precisely  synchronized  to  the  transmitter. 
Only  a  sufficiently  long  subsegment  of  each  transmitted  orbit  segment  needs  to  be  isolated 
for  detection.  Also,  because  of  the  flexibility  in  designing  EMC  maps,  one  could  choose 
the  M  maps  as  a  function  of  the  expected  noise  level  of  the  channel.  At  low  noise  levels, 
the  maps  and  the  associated  signals  could  be  chosen  to  be  nearly  indistinguishable  so  as 
to  minimize  the  possibility  of  interception,  whereas  at  high  noise  levels  the  maps  could  be 
chosen  to  be  quite  dissimilar,  possibly  antipodal  in  the  binary  case  as  discussed  in  [65, 66, 67]. 

A  fundamental  result  from  estimation  theory  is  that  for  equally  likely  maps,  the  optimal 
detection  rule,  in  a  minimum  probability  of  error  sense,  is  to  choose  that  map  among  the 
M,  with  the  largest  likelihood  p(Y\fk),  where  p(Y\fk)  is  the  PDF  of  the  observation  set 
Y  conditioned  on  the  map  fk  having  generated  the  orbit  segment  X  giving  rise  to  Y. 
The  next  two  sections  introduce  optimal  and  suboptimal  algorithms  for  calculating  these 
likelihoods.  In  both  sections,  the  transformations  {hk},  the  M  maps,  and  the  variance 
<72  of  the  observation  noise  are  assumed  to  be  known.  Section  7.4  considers  optimal  and 
suboptimal  methods  to  partially  overcome  the  need  for  these  assumptions. 

7.2.2  Detection  with  Quantized  Orbit  Points 

One  can  efficiently  compute  the  exact  likelihoods  used  in  the  optimal  detection  rule  when 
each  of  the  M  transformations  hk  is  a  quantizer  which  associates  a  single,  unique  value 
with  each  element  of  a  Markov  partition  for  fk-  That  is,  if  I  If  denotes  the  jth  partition 
element  for  map  fk  and  Hlf  denotes  the  value  associated  with  this  partition  element  by  hk, 
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then  hk(x)  =  Hj  for  all  x  £  if-  (For  example,  Hk  might  be  the  midpoint  or  an  endpoint 
of  iff).  With  the  M  quantizers  chosen  this  way,  the  detection  problem  reduces  to  that  of 
discriminating  among  M  hidden  Markov  models  (HMMs).  As  such,  one  can  use  the  forward 
portion  of  the  computationally  efficient  forward-backward  algorithm  [74,  75]  to  calculate  the 
the  M  likelihoods  {p(Y\fk)}kLi- 

Specifically,  for  each  map  fk  the  partition  elements  { Ij }  correspond  to  the  unobserved 
states  in  the  HMM  associated  with  that  map,  where  Tk  denotes  the  number  of  partition 
elements  for  the  Markov  partition  associated  with  fk-  For  the  problem  scenario  introduced 
earlier,  ok(n),  the  output  at  time  n  associated  with  Ij,  is  given  by 

of(n)  =  H}  +  v{n).  (7.2) 

In  other  words,  if  fk  generated  the  orbit  segment  X  =  {a;(0}ilo  anc^  x(n )  €  Ff,  then  y(n), 
the  observation  at  time  n,  is  given  by 

y(n)  =  hk(x(n))  +  v(n)  =  Oj(n )  =  H-  +  v(n).  (7.3) 

With  this  observation  equation,  p(y(n)\[x(n)  €  Ij,fk]),  the  PDF  of  the  observation  y(n) 
conditioned  on  the  map  fk  having  generated  the  orbit  segment  and  x(n)  being  in  partition 
element  Ik,  is  given  by 

p(y(n)\[x{n)  £  Ik ,  fk ])  =  pv  (v(n)  =  v(n)  ~  Hj)  (7-4) 

where  py  is  the  PDF  of  each  term  in  the  white-noise  sequence. 

We  define  the  forward  variable  a|(n)  as  follows: 

a* (n)  =  p(y( 0),  2/(1),  ••• ,  y{n),  x{n)  €  I-\fk)-  (7.5) 

That  is,  atj(n)  is  the  joint  PDF  of  the  observations  through  time  n  and  x{n)  £  Ik ,  condi¬ 
tioned  on  map  fk  having  generated  the  orbit  segment  X  =  {a 'i}$L0.  Note  that  Oj  (n)  can 
also  be  expressed 

aj(n)  =p(y(0),y(l),-”,y(n)l[x(n)  e  Ij,fk])p(x(n)  £  Ij\fk).  (7.6) 
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Also,  we  let  Pk  =  \p\j\ffl=1  denote  the  TPM  associated  with  /*,  where  pk-  denotes  the 
probability  that  x(n)  €  if  given  that  x(n  -  1)  €  if.  (Note  that  Pk  and  pf-  have  different 
meanings  than  in  Chapter  6). 

Using  these  definitions  and  notations!  conventions,  we  have  the  following  computation¬ 
ally  efficient  algorithm,  the  forward  algorithm,  for  calculating  each  likelihood  function  and 
optimally  discriminating  among  the  maps: 

1.  For  each  map  /*,  compute  ak(n)  as  a  function  of  n  for  j  =  1,  •  ■  • ,  Tk  with  the  following 
recursion: 


«f(0) 

ak(n  +  1) 


p(y(0)MO)elf,fk])p(x(0)€lf\fk) 

_/=i 

xp{y(n  +  l)|[*(n  +  1)  €  Ijjk]),  n  =  0, 1,  — ,  JV  -  1 


(7.7) 


2.  For  each  map,  compute  the  likelihood  p(Y\fk )  by  exploiting  the  relation  given  by 

p(n«  =  !>?(*)•  (r.s) 

i=i 


3.  Choose  the  map  fk  for  which  p(Y\fk)  is  largest. 

The  detection  algorithm  requires  specification  of  initial  state  probabilities  p(x(0)  € 
Ij\fk)  for  each  map.  These  probabilities  are  particularly  easy  to  determine  if  the  initial 
condition  is  a  random  variable  with  constant  PDF  over  the  unit  interval.  In  this  case, 
p(x(0)  €  Ij\fk)  is  given  by  the  length  of  Ik.  In  the  more  general  case,  p{x( 0)  €  Ij\fk)  is 
given  by  the  PDF  of  x(0)  integrated  over  if.  The  PDF  of  x(0)  can  not  be  arbitrary,  for  the 
Markov  chain  property  of  the  maps  to  hold  as  suggested  by  Corollary  1  in  Chapter  6.  In 
particular,  the  PDF  of  x(0)  must  be  constant  over  each  partition  element  if  for  the  Markov 
chain  property  to  apply  to  map  k.  In  general,  when  this  condition  is  violated,  the  partition 
elements  no  longer  correspond  to  the  states  of  a  homogeneous  Markov  chain,  but  they  may 
correspond  to  the  states  of  an  inhomogeneous  Markov  chain,  that  is,  a  Markov  chain  with 
time-varying  state  transition  probabilities. 
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7.2.3  Detection  with  Unquantized  Orbit  Points 


When  each  of  the  M  transformations  hk  is  the  identity  operator,  the  detection  problem  is 
that  of  discriminating  among  M  EMC  maps  based  on  noisy  observations  of  unquantized 
orbit  points.  In  general,  for  the  case  of  unquantized  orbit  points,  optimal  discrimination 
among  the  M  maps  is  not  computationally  feasible  because  the  initial  condition  x(0)  is 
unknown  and  the  maps  are  chaotic.  However,  if  the  EMC  maps  each  give  rise  to  arbitrarily 
fine  Markov  partitions,  computationally  efficient,  albeit  suboptimal  discrimination  is  still 
possible  if  we  model  the  dynamics  of  each  map  as  an  HMM  and  apply  the  detection  rule 
outlined  earlier. 

Specifically,  we  first  select  a  sufficiently  fine  Markov  partition  for  each  map,  with  the 
the  necessary  fineness  of  the  partition  dependent  upon  the  M  maps  being  used.  We  know 
of  no  optimality  criteria  for  quantifying  this  expression  for  a  given  set  of  maps.  One  subop¬ 
timal,  practical  method  to  choose  partition  sizes  is  by  trial-and-error,  with  detection  results 
obtained  experimentally  via  Monte  Carlo  simulations.  On  the  basis  of  experiments  with 
various  EMC  maps  which  give  rise  to  uniform  Markov  partitions,  it  appears  that  there  is 
a  threshold  partition  size,  with  finer  partitions  offering  little  if  any  improvement  in  perfor¬ 
mance  over  a  partition  with  elements  having  length  equal  to  this  threshold  size. 

Having  selected  a  Markov  partition  for  each  map,  we  apply  the  detection  rule  outlined  in 
Section  7.2.2,  with  one  fundamental  change.  In  particular,  we  use  the  following  expression 
for  Oj(n )  in  place  of  (7.2): 

Oj(n)  =  Uj(n)  +  o(n),  (7.9) 

where  Uj(n )  is  a  random  variable  which  is  independent  of  v(n)  and  which  has  a  constant  PDF 
with  region  of  support  Ij.  Thus,  each  observation  y(n)  is  now  the  sum  of  two  independent 
random  variables  with  conditional  PDF  p(y(n)|[x(n)  £  lf,fk])  given  by  the  convolution  of 
the  PDFs  of  Uj(n)  and  v(n). 

The  motivation  for  modeling  the  noise-free  output  of  each  state  as  a  uniform  random 
variable  with  region  of  support  over  the  corresponding  partition  element  follows  from  two 
facts.  First,  as  noted  above,  for  the  Markov  chain  property  to  hold,  the  PDF  of  the  initial 
condition  x(0)  must  be  constant  over  each  partition  element.  Second,  as  shown  in  [12],  the 
Frobenius-Perron  operator  restricted  to  PDFs  that  are  constant  over  each  partition  element 
of  a  Markov  partition  for  an  MC  map  is  a  linear  operator.  As  a  consequence,  if  the  PDF 
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of  the  initial  condition  x(0)  is  constant  over  each  partition  element,  the  PDF  induced  by 
/  acting  on  x(0)  is  constant  over  each  partition  element  as  well.  As  a  result,  the  PDF  of 
x(n )  conditioned  on  x(n)  lying  in  a  given  partition  element  is  constant  over  that  partition 
element,  thereby  suggesting  the  appropriateness  of  modeling  the  noise-free  output  of  each 
state  of  the  Markov  chain  in  the  HMM  as  a  uniform  random  variable. 

It  is  straightforward  to  extend  this  suboptimal  detection  algorithm  to  the  case  in  which 
each  hk  is  not  the  identity,  but  instead  an  arbitrary,  piecewise  differentiable,  memoryless 
transformation.  In  this  case,  an  appropriate  model  for  the  noise-free  output  of  each  state 
is  that  of  a  random  variable  with  PDF  induced  by  hk  acting  on  a  uniformly  distributed 
random  variable,  where  the  region  of  support  of  the  uniformly  distributed  random  variable 
is  the  corresponding  partition  element. 

7.3  Detection  Examples 

In  this  section,  we  compare  the  performance  of  the  optimal  and  suboptimal  detection  algo¬ 
rithms.  We  focus  on  the  binary  detection  problem  in  which  we  seek  to  discriminate  between 
two  EMC  maps  based  on  a  noise-corrupted  orbit  segment  generated  by  one  of  the  maps.  For 
convenience  but  not  by  necessity,  all  performance  results  were  obtained  with  a  Gaussian, 
white  noise  sequence  used  for  {v(i)}£L0. 

We  first  consider  the  two  maps  depicted  in  Figures  7-1  and  7-2,  with  typical  orbit 
segments  also  shown  in  the  figures.  The  maps  are  antipodal  in  the  sense  that  for  a  given 


x(n)  TIME  (n) 

Figure  7-1:  EMC  map  f\  and  typical  orbit  segment,  (a)  EMC  map;  (b)  Orbit  segment. 

initial  condition  x(0),  corresponding  orbit  points  of  the  maps  satisfy  the  relation  /i(af(0))  = 
1  —  ( ^ ( 0 ) )  for  i  >  0.  (The  conditions  required  for  antipodality  were  established  in  [65]). 
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Figure  7-2:  EMC  map  /2  and  typical  orbit  segment,  (a)  EMC  map;  (b)  Orbit  segment. 


Because  the  maps  are  antipodal,  their  power  density  spectra  are  identical.  Both  maps  also 
have  identical  stationary  PDFs  given  by  the  constant  value  one  over  the  unit  interval.  Any 
uniform  partition  of  the  unit  interval  into  4  N  subintervals,  where  N  is  any  positive  integer, 
is  a  Markov  partition  for  each  map. 

Figures  7-3  (a)  and  (b)  depict  the  error  probabilities  as  a  function  of  the  input  SNR  for 
an  8-element  Markov  partition  and  differently  sized  orbit  segments. 


Figure  7-3:  Detection  error  probabilities  Pe  for  EMC  maps  and  /2  with  8-element  Markov 
partitions  and  differently  sized  orbit  segments  used  in  the  likelihood  function,  (a)  Quantized 
orbit  points;  (b)  Unquantized  orbit  points. 


The  actual  ratio  used  for  each  input  SNR  value  is  that  of  the  variance  of  a  random 
variable  with  constant  PDF  over  the  unit  interval  (which  equals  1/12)  and  the  observation 
noise  variance.  Whereas  the  stationary  PDF  of  each  map  has  the  constant  value  one  over  the 
unit  interval,  the  SNR  values  are  the  same  as  the  SNR  values  which  use  the  actual  signal 
variance  (given  by  the  variance  of  points  in  a  typical  orbit)  when  the  orbits  points  are 
unquantized,  but  may  differ  slightly  from  the  actual  signal  variance  with  coarsely  quantized 
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orbit  points.  The  curves  are  parameterized  by  the  length  of  the  noise-corrupted  orbit 
segment  used  in  the  likelihood  function.  Ten  thousand  independent  trials,  with  each  map 
generating  the  orbit  segments  for  half  the  trials,  were  used  to  obtain  the  error  probabilities 
for  values  greater  than  or  equal  to  .01,  whereas  one  hundred  thousand  independent  trials 
were  used  to  obtain  error  probabilities  below  this  value.  The  plotted  results  indicate  that 
for  a  given  input  SNR,  performance  improves  with  an  increase  in  the  orbit  segment  size,  as 
one  might  expect.  In  addition,  the  results  suggest  that  the  performance  of  the  suboptimal 
detector  (the  one  using  unquantized  orbit  points)  is  comparable  to  that  of  the  optimal 
detector  (the  one  using  quantized  orbit  points)  with  20-point  and  40-point  orbit  segments. 

Figures  7-4  show  performance  results  for  differently  sized  Markov  partitions  and  a  20- 
point  orbit  segment  used  in  the  likelihood  function.  The  results  suggest  that  the  detection 


INPUT  SNR  (dB) 

Figure  7-4:  Detection  error  probabilities  Pe  for  EMC  maps  /i  and  f2  with  20-point  orbit 
segments  and  differently  sized  Markov  partitions  used  in  likelihood  functions,  (a)  Quantized 
orbit  points;  (b)  Unquantized  orbit  points. 

algorithm  is  insensitive  to  the  size  of  the  Markov  partition  used  for  discriminating  between 
the  two  maps. 

Figures  7-7  and  7-8  depict  analogous  performance  results  as  those  in  Figures  7-3  and  7-4, 
but  obtained  with  the  two  EMC  maps  /3  and  /4  shown  along  with  typical  orbit  segments 
in  Figures  7-5  and  7-6.  EMC  maps  /3  and  /4  satisfy  a  weaker  form  of  antipodality  than 
maps  /i  and  f2.  Specifically,  while  it  is  not  true  that  /3(z( 0))  =  1  —  /4(a:(0))  for  i  >  0,  it  is 
true  that  /3(a;(0))  =  1  —  /4(  1  —  z(0)).  A  comparison  of  the  performance  results  for  f\  and 
f2  with  the  results  for  /3  and  /4  suggests  that  the  former  results  are  slightly  better  than 
the  latter,  as  one  might  expect  in  light  of  the  stronger  form  of  antipodality  satisfied  by  /i 
and  f2. 
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Figure  7-5:  EMC  map  fa  and  typical  orbit  segment,  (a)  EMC  map;  (b)  Orbit  segment. 
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Figure  7-6:  EMC  map  /4  and  typical  orbit  segment,  (a)  EMC  map;  (b)  Orbit  segment. 


Figure  7-7:  Detection  error  probabilities  Pe  for  EMC  maps  fa  and  /4  with  8-element  Markov 
partitions  and  differently  sized  orbit  segments  used  in  likelihood  functions,  (a)  Quantized 
orbit  points;  (b)  Unquantized  orbit  points. 
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Figure  7-8:  Detection  error  probabilities  Pe  for  EMC  maps  fs  and  / 4  with  20-point  orbit 
segments  and  differently  sized  Markov  partitions  used  in  likelihood  functions,  (a)  Quantized 
orbit  points;  (b)  Unquantized  orbit  points. 

One  should  not  infer  from  these  examples  that  the  detection  algorithms  are  only  useful 
with  antipodal  maps.  In  fact,  there  may  be  detection  applications,  such  as  secure  commu¬ 
nication  over  low-noise  channels,  in  which  the  use  of  antipodal  maps  is  neither  necessary 
nor  desirable. 


7.4  Scale  Factor  and  Noise  Variance  Estimation 

The  detection  algorithms  introduced  in  Sections  7.2.2  and  7.2.3  require  that  the  observa¬ 
tion  noise  variance  as  well  as  each  output  transformation  h\ t  be  known.  However,  these 
quantities  may  be  unknown  or  only  partially  known  in  practical  applications.  For  example, 
in  applications  in  which  orbit  segments  from  EMC  maps  are  transmitted  over  a  channel, 
the  orbit  points  may  undergo  a  number  of  unknown  transformations,  such  as  linear  and 
nonlinear  filtering,  fading,  scaling,  and  noise  corruption.  In  this  section  we  focus  on  two 
of  these  transformations — scaling  and  noise  corruption.  In  particular,  we  discuss  how  to 
simultaneously  estimate  a  constant,  multiplicative  scaling  factor  applied  to  each  orbit  point 
and  the  variance  of  an  additive,  Gaussian,  white,  corrupting  noise. 

By  exploiting  the  relation  between  noise- corrupted  orbit  segments  of  Markov  maps  and 
sample  paths  of  hidden  Markov  models,  we  can  derive  computationally  efficient  iterative  al¬ 
gorithms  for  estimating  an  unknown  scale  factor  and  noise  variance.  When  the  orbit  points 
are  properly  quantized,  the  estimation  algorithms  are  iterative  ML  estimation  algorithms. 
With  unquantized  points,  the  algorithms  are  not  ML  estimation  algorithms  but  are  poten¬ 
tially  effective,  nevertheless.  The  specific  estimation  algorithms  we  use  are  variations  of 
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the  Baum- Welch  re-estimation  procedure  [74,  75],  an  iterative  ML  technique  for  estimating 
parameters  of  hidden  Markov  models  from  sample  observations.  Of  interest  here  is  the  case 
in  which  the  EMC  map  is  known,  as  well  as  a  Markov  partition  for  the  map  and  the  TPM 
of  the  Markov  chain  corresponding  to  the  partition. 

We  first  consider  the  case  of  quantized  orbit  points.  For  this  case,  we  can  formulate 
the  re-estimation  procedure  for  estimating  the  unknown  noise  variance  and  scale  factor 
as  follows.  As  noted  above,  we  assume  that  we  are  given  a  Markov  map  /,  a  Markov 
partition  for  the  map  ,  a  TPM  P  =  [p,j]  for  the  Markov  chain  associated  with  the 

partition,  and  an  associated  set  of  constant  values  with  Hj  the  value  associated 

with  partition  element  Ij.  The  set  consists  of  the  the  state  values  or  equivalently 

the  quantization  values  of  the  orbit  points,  with  the  value  Hj  associated  with  each  orbit 
point  lying  in  Ij.  In  the  following  equations,  H(n )  denotes  the  quantization  value  associated 
with  the  orbit  point  x(n),  i.e.,  H(n)  =  Hj  if  x{n )  €  Ij.  Finally,  we  assume  that  an  ( N  +  1)- 
point  orbit  segment  X  =  {ar(0}ilo  generated  by  the  map  and  we  observe  the  observation 
set  Y  =  {y(i)}^0,  where  y(n )  is  given  by 


y(n)  =  k  H(n)  +  v(n)  (7.10) 

and  where  k  is  the  unknown  scale  factor  we  seek  to  estimate  and  {u(0}£Lo  a  Gaussian, 
white-noise  sequence  with  unknown  variance  a2,  which  we  also  seek  to  estimate. 

As  in  Sections  7.2.2  and  7.2.3,  we  define  the  forward  variable  otj(n)  as 

aj(n)  =  p(y{0),y(l),---,y{n),x(n)  e  Ij),  (7.11) 

which  as  indicated  earlier  is  recursively  computable: 


ai( o)  =  Ky(o)[[*(o)  €  /jDK^o)  e  ij) 


<*j(n  +  1)  = 


■  T 

2  a‘(n)plj 
.1=1 


xp(y(n+  l)|[x(n+  1)  €  Ij]),  n  =  0, 1,  •  •  •,  N  -  1 


(7.12) 


(7.13) 


Now  we  also  define  the  backward  variable  /3j(n)  as 


/3j(n)  =  p(y(n+  l),y(n+ 2),- ■ -,y(N)\x(n)  e  Ij),  (7.14) 
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which  is  recursively  computable  backward  in  time  as  follows: 

=  1  (7.15) 

X 

Pj(n)  =  '52pjip{y(n+  !)IKn  +  !)  €  Ii])(3[(n  +  i),  n  =  0,  •  •  -,1V  —  1.  (7.16) 

1=1 

Finally,  we  define  the  conditional  state  variable  7 j(n)  as 


7 j(»)  =  P{x{n)  €  Ij\Y), 


(7.17) 


where  Y  is  the  observation  set  {t/(*)}£Lo-  As  indicated  by  (7.17),  7 j(n)  denotes  the  probabil¬ 
ity  density  that  x(n )  G  Ij  conditioned  on  the  entire  observation  set.  The  forward,  backward, 
and  conditional  state  variables  satisfy  the  following  relation: 


7 j(n)  = 


<*j(n)Mn)  _  aj(n)fy(n) 

P(X)  Ej=  1  «,■(»)&(«)) 


(7.18) 


where  the  denominator  is  simply  a  normalization  constant. 

It  follows  from  more  general  results  in  [53,  75]  that  when  each  state  Sj  of  a  hidden 
Markov  model  has  a  scalar,  Gaussian  output  PDF  with  mean  mj  and  variance  tr|,  the 
re-estimation  equations  for  rhj  and  dj,  the  estimates  of  mj  and  <rj,  are  given  by 


ZiLoUii) 

r£o7i(*') 


(7.19) 

(7.20) 


Note  that  if  7 j(i)  =  1  for  all  i,  the  re-estimation  formulas  are  the  sample  mean  and  variance. 
The  re-estimation  formulas  are  iterative  algorithms,  with  the  estimated  means  and  variances 
used  to  calculate  the  conditional  state  probabilities  {jj(n)}  which  are  subsequently  used  in 
the  above  equations  to  re-estimate  the  means  and  variances. 

For  the  problem  of  interest  here,  mj  =  k  Hj  where  k  is  the  unknown  scale  factor  and 
Hj  is  known.  For  this  problem,  a  straightforward  derivation,  (for  which  we  know  of  no 
references)  analogous  to  that  used  in  [53]  to  obtain  the  above  re-estimation  equations, 
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yields  the  following  re-estimation  equation  for  k,  the  estimate  of  the  scale  factor  k: 


k  = 


(7.21) 


which  is  a  weighted  average  of  the  observations,  as  is  the  case  with  (7.19),  but  with  the 
weights  now  also  dependent  on  the  known  values  {#,}.  Similarly,  since  a2  =  a2  is  the  same 
for  each  state,  a  straightforward  derivation  yields  the  following  re-estimation  equation  for 


,2  =  S£,£js,7j(0  WMflj)2 
ElL>X5=i7l(i) 

"l"  i=0  j= 1 


(7.22) 

(7.23) 


As  discussed  in  [75],  the  Baum- Welch  re-estimation  procedure  is  closely  related  to  the 
Expectation-Maximization  (EM)  algorithm,  and  as  with  the  EM  algorithm  only  local  con¬ 
vergence  is  assured  with  the  re-estimation  procedure.  As  a  consequence,  the  estimated 
parameter  values  converge  to  steady-state  values  which  may  or  may  not  be  the  actual  pa¬ 
rameter  values. 

Two  problems  arise  with  unquantized  orbit  points.  First,  the  HMM  model  introduced 
in  Section  7.2.3  is  not  an  exact  representation  of  the  dynamics  underlying  the  observations. 
Second,  the  derivation  of  closed  form  re-estimation  equations  for  an  unknown  scale  factor 
and  observation  noise  variance  appears  to  be  an  intractable  problem.  However,  using  the 
above  re-estimation  equations  for  quantized  orbit  points  as  a  foundation,  we  can  derive 
Baum- Welch-like  re-estimation  equations  for  the  unknown  scale  factor  and  noise  variance 
with  the  HMM  model  for  unquantized  orbit  points.  Recall  that  with  this  model,  each 
observation  y(n )  is  the  sum  of  two  random  variables,  one  the  observation  noise  term  with 
unknown  variance  we  seek  to  estimate,  and  the  other  a  uniform  random  variable  with  mean 
k  Lj  +  k-Li+1~kLi  and  variance  ^kLi+1i~kL^  for  some  j,  where  Lj  denotes  the  left  endpoint  of 
partition  element  lj  and  k  denotes  the  unknown  scale  factor.  Thus,  an  intuitively  reasonable 
re-estimation  procedure  for  obtaining  the  unknown  scale  factor  and  variance  is  to  use  k  Lj  + 
kL^i-kLj  .q  pjace  Qf  ^  jj.  above  re-estimation  equations  for  quantized  orbit  points 

and  to  subtract  from  each  term  in  the  above  expression  for  the  variance  estimate,  an  estimate 
of  the  variance  of  the  uniform  random  variable  associated  with  that  term,  or  in  other  words 
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a  term  having  the  form  (kLi+1l2kLi)-.  The  “Baum- Welch-like”  re-estimation  equations  for 
unquantized  orbit  points  that  result  are  the  following: 


k  = 


<72 


ElL.ELi7i(>')(i<  + 


L,+i-L] 


)»(*) 


N  T 


N  + 


i=0  j= 1 


(vW-HLj+Li*\  li)f  - 


{k  Lj. |-i  k  Lj) 

12 


(7.24) 

(7.25) 

(7.26) 


where  we  define  X/y+i  to  be  the  right  endpoint  of  ijv. 

Figures  7-9  and  7-10  depict  the  error  in  estimating  the  scale  factor  and  noise  variance  for 
the  EMC  map  shown  in  Figure  7-5.  The  actual  scale  factor  used  for  the  results  was  2  and  the 


Figure  7-9:  Mean-squared  error  (MSE)  (average  of  100  trials)  for  scale  factor  estimation 
for  EMC  fz  map  using  100  observations.  (Actual  value  of  scale  factor  value  was  2).  (a) 
Quantized  orbit  points;  (b)  Unquantized  orbit  points. 


noise  variance  was  determined  by  the  SNR.  The  performance  measure  used  in  Figures  7-9 
(a)  and  (b)  is  the  squared  estimation  error  averaged  over  100  independent  trials.  Similarly, 
the  performance  measure  used  in  Figures  7-10  (a)  and  (b)  is  the  squared  estimation  error 
averaged  over  100  independent  trials  and  normalized  by  the  actual  value  of  the  variance. 
The  curves  in  the  figures  are  parameterized  by  the  size  of  the  uniform  Markov  partition 
used,  with  100  observations  used  in  the  estimation  equations  for  each  plotted  result.  The 
scale  factor  and  variance  estimates  were  both  initialized  with  the  value  1.  The  estimation 
algorithms  were  iterated  either  until  they  converged  or  an  upper  limit  of  50  iterations  was 
reached. 

As  indicated  by  the  figures,  the  results  with  quantized  and  unquantized  orbit  points  are 
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Figure  7-10:  Normalized  mean-squared  error  (MSE)  (average  of  100  trials)  for  noise  variance 
estimation  for  EMC  map  fs  using  100  observations.  (Noise  variance  determined  by  input 
SNR),  (a)  Quantized  orbit  points;  (b)  Unquantized  orbit  points. 

comparable  except  for  variance  estimation  with  large  input  SNRs  and  an  8-element  Markov 
partition,  or  equivalently  an  8-state  Markov  chain  in  the  HMMs.  The  results  also  indicate 
an  insensitivity  to  the  Markov  partition  size,  except  for  variance  estimation  at  high  input 
SNRs  with  unquantized  orbit  points.  Figures  7-11  and  7-12  depict  the  performance  results 
for  a  fixed-sized,  16-element  uniform  Markov  partition  (equivalently  a  16-state  Markov  chain 
in  the  HMMs)  and  differently  sized  orbit  segments.  The  figures  suggest  a  strong  correla¬ 
tion  between  scale  factor  estimation  accuracy  and  the  orbit  segment  size  but  negligible 
correlation  between  variance  estimation  accuracy  and  the  orbit  segment  size. 


Figure  7-11:  Mean-squared  error  (MSE)  for  scale  factor  estimation  for  EMC  map  fa  using 
16-element  uniform  Markov  partition.  (Actual  value  of  scale  factor  value  was  2).  (a) 
Quantized  orbit  points;  (b)  Unquantized  orbit  points. 

The  question  arises  as  to  the  advantage  of  the  re-estimation  equations  for  unquantized 
orbit  points  over  the  re-estimation  equations  for  quantized  orbit  points,  when  the  observa¬ 
tions  arise  from  unquantized  orbit  points.  Figure  7-13  depicts  the  absolute  bias,  defined 
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Figure  7-12:  Normalized  mean-squared  error  (MSE)  for  noise  variance  estimation  for 
EMC  map  fz  using  16-element  uniform  Markov  partition.  (Noise  variance  determined  by 
input  SNR),  (a)  Quantized  orbit  points;  (b)  Unquantized  orbit  points. 


as  the  absolute  value  of  the  difference  between  the  estimated  scale  factor  and  actual  scale 
factor,  averaged  over  100  independent  trials,  obtained  with  both  scale  factor  re-estimation 
equations  when  applied  to  observations  arising  with  unquantized  orbit  points.  As  indicated 
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to 

< 

CD 
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Figure  7-13:  Estimated  scale  factor  minus  actual  scale  factor  with  re-estimation  equations 
for  quantized  and  unquantized  orbit  points  using  100  observations  arising  from  unquantized 
orbit  points  and  a  16-element  uniform  Markov  partition.  (Actual  value  of  scale  factor  was 
2). 


in  the  figure,  a  nonnegligible  positive  bias  arises  at  all  SNRs  with  the  re-estimation  equation 
for  quantized  orbit  points.  Additional  experiments  have  shown  the  bias  to  be  dependent  on 
the  scale  factor  and  to  increase  as  the  scale  factor  increases. 
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7.5  Map  Discriminability 


A  useful  tool  when  selecting  EMC  maps  for  detection  applications  would  be  a  metric  which 
quantifies  the  discriminability  among  these  maps.  Unfortunately,  finding  such  a  metric 
remains  an  elusive  goal.  We  have  explored  metrics  based  on  divergence  (or  cross  entropy) 
rates  of  Markov  chains  [19]  and  Bhattacharya  distance  measures  [41]  with  little  success.  The 
fundamental  problem  is  in  finding  a  metric  that  is  consistent  with  experimentally  obtained 
detection  results. 

A  useful  empirical  metric  for  hidden  Markov  models  with  finite  observation  alphabets 
was  introduced  in  [39].  This  metric,  which  uses  asymptotic  cross-information  rates  (defined 
below),  is  easily  adapted  for  use  with  continuous-valued  observations  and  EMC  maps.  Un¬ 
desirable  aspects  of  the  metric  are  that  it  is  a  function  of  the  observation  noise  variance  and 
its  calculation  may  require  that  a  large  set  of  simulated  observations  be  obtained  for  each 
map  being  compared. 

The  metric  arises  as  follows.  We  start  with  a  set  of  M  hidden  Markov  models  {/*} 
with  each  consisting  of  an  underlying  homogeneous,  finite-state  Markov  chain  and  state- 
dependent  output.  For  the  problems  of  interest  here,  the  underlying  Markov  chains  are 
those  corresponding  to  Markov  partitions  of  EMC  maps  and  the  output  associated  with 
each  state  of  the  Markov  chain  is  either  a  Gaussian  random  variable  when  orbit  points  are 
quantized  or  the  sum  of  a  Gaussian  random  variable  and  uniformly  distributed  random 
variable  when  orbit  points  are  not  quantized. 

We  let  Y*  denote  a  set  of  n  observations  associated  with  fj  and  define  an  n-point 
information  rate  E(j,j,n )  as 

EtiJrfshogrfYilfj)  (7.27) 

and  n-point  cross-information  rates  E(k,j,n)  as 

E(k,j,n)=-logp(Yi\fk),  k=l,2,---,M,k^j.  (7.28) 

n 

These  rates  are  simply  log-likelihood  values  normalized  by  the  number  of  observations.  Note 
that  these  quantities  are  not  entropy  rates  as  they  do  not  involve  the  expectation  over  the 
observation  set. 
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For  a  restricted  form  of  this  scenario  in  which  only  a  finite  number  of  values  are  possible 
for  each  output,  it  was  shown  in  [70]  that  each  of  the  following  limits  exists: 

E(k,j)  =  ~ log piY^fk),  k  =  1,  ■  •  •  Af  (7.29) 

and  that  the  following  relations  hold: 

E(j,j)>E(k,j),  k  =  l,---M.  (7.30) 

For  the  detection  problems  of  interest  here  involving  continuous- valued  outputs,  convergence 
of  the  n-point  information  rates  has  never  been  proven.  Nonetheless,  experimental  results 
obtained  with  EMC  maps  suggest  that  the  n-point  information  and  cross-information  rates 
approach  asymptotic  values  with  small  perturbations  about  these  values  as  n  grows  large. 
For  example,  Figure  7-14  depicts  differences  of  information  and  cross-information  rates  for 
EMC  maps  f\  and  fi  shown  in  Figures  7-1  and  7-2  as  a  function  of  time  with  an  input 
SNR  of  0  dB.  As  the  figures  suggest,  the  differences  appear  to  approach  asymptotic  mean 


Figure  7-14:  Information  rate  differences  as  a  function  of  time  for  EMC  maps  fi  and  /2 
with  an  input  SNR  of  0  dB.  (a)  Quantized  orbit  points;  (b)  Unquantized  orbit  points. 

values  with  small  perturbations  about  the  mean.  The  motivation  for  showing  the  rate  dif¬ 
ferences  E(  1, 1,  n)  —  E(2, 1,  n)  and  E( 2, 2,  n )  —  £(1,2,  n)  and  not  the  individual  information 
and  cross-information  rates  arises  from  the  close  relation  among  these  differences  and  the 
discrimination  criterion  used  in  the  detection  algorithms  introduced  earlier.  In  particular, 
using  (7.27)  and  (7.28),  we  have  the  following: 

£(l,l,n)-£(2,l,n)  =  logj£|j|j]  P-31) 
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£(2,2, rc)  —  £(1,2,  n)  =  log 


(7.32) 


With  the  detection  algorithms,  the  discrimination  criterion  consists  of  deciding  upon  that 
map  for  which  the  corresponding  likelihood  p(Yfafa)  is  largest  for  a  fixed  observation  set 
Yi.  For  binary  detection  involving  maps  fa  and  fa,  a  correct  decision  is  made  when  (7.31) 
is  positive  for  the  case  in  which  fa  generated  the  observations.  Similarly,  a  correct  decision 
is  made  when  (7.32)  is  positive  for  the  case  in  which  /2  generated  the  observations.  As  such, 
the  differences  E(  1, 1,  n)  —  E( 2, 1,  n)  and  E( 2, 2,  n)  —  E(  1, 2,  n)  are  qualitative  indicators  of 
the  discriminability  between  the  maps  as  n  grows  large,  with  larger,  positive  values  of  the 
differences  suggesting  greater  discriminability. 

Figure  7-15  depicts  information  rate  differences  for  EMC  maps  fa  and  fa  shown  in 
Figures  7-5  and  7-6  as  a  function  of  time  with  an  input  SNR  of  0  dB.  As  the  figures  suggest 


Figure  7-15:  Information  rate  differences  as  a  function  of  time  for  EMC  maps  fz  and  fa 
with  an  input  SNR  of  0  dB.  (a)  Quantized  orbit  points;  (b)  Unquantized  orbit  points. 

and  as  the  performance  results  presented  earlier  confirm,  the  discriminability  between  fa 
and  fa  is  much  greater  than  that  between  fa  and  fa 

7.6  State  Estimation  with  MC  Maps 

7.6.1  Problem  Overview 

In  the  preceding  sections  of  this  chapter,  we  have  shown  how  the  close  relation  between 
EMC  maps  and  Markov  chains  facilitates  the  detection  or  discrimination  of  these  maps 
based  on  noise- corrupted  orbit  segments.  In  this  section,  we  show  how  this  relation  facil¬ 
itates  practical,  optimal  and  suboptimal  ML  state  estimation  with  EMC  maps  and  more 
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generally  MC  maps. 

We  focus  on  the  fixed-interval  smoothing  scenario  in  which  we  are  given  an  MC  map  /, 
an  unobserved  ( N  +  l)-point  orbit  segment  {x(i)  =  fl(x(0))}fLo  generated  by  /,  and  a  set 
of  N  +  1  observations  Y  =  {y(i)}^L0  with  y(i)  given  by 


y{i)  =  ar(z)  +  v(i) 


(7.33) 


where  {v(*)}£Lo  is  a  Gaussian,  white- noise  sequence  with  variance  a2.  It  follows  from  earlier 
results  in  the  thesis  that  for  this  scenario,  the  log-likelihood  function  for  the  jth  orbit  point, 
log p(Y;x(j)),  is  given  by 


log  p(Y ;  *«))  =  C( JV)  -  E  (,(‘) 


(7.34) 


where  C(N)  is  a  constant  independent  of  both  the  observation  set  Y  and  the  orbit  segment, 
and  where  *(o  )J)(  x(j ))  equals  /*  *(x(j))  if  i  -  j  >  0  and  equals  the  actual  value  of  the 
inverse  image  p~^{x(j))  which  gave  rise  to  the  observations  if  i  —  j  <  0.  In  other  words, 
we  assume  that  there  is  a  fixed  initial  condition  x(0)  so  that  the  following  holds: 


*(*')  =  mo))  =  f(x(0),j)(x(j))  *» 3  =  0,  ■  •  ■ , N  (7-35) 

Because  /  is  a  deterministic  mapping,  there  is  a  bijective  correspondence  between  initial 
conditions  x(0)  and  ( N  +  l)-point  orbit  segments  {/!(x(0))}^.Oi  so  that  any  property  as¬ 
sociated  with  a  specific  orbit  segment  can  be  associated  with  a  specific  initial  condition  as 
well. 

We  let  L  denote  the  smallest  number  of  affine  segments  of  /  in  the  sense  that  two  affine 
segments  are  considered  part  of  the  same  segment  if  they  have  the  same  affine  parameter 
pair.  Also,  we  let  (rt-,  /?;)  and  A,-  denote  the  affine  parameter  pair  and  segment  domain, 
respectively,  associated  with  the  ith  affine  segment,  so  that  f{x)  =  r,  x  +  /?,-  if  x  €  A,-.  Note 
that  each  A,-  is  a  subinterval  of  the  unit  interval,  disjoint  from  all  the  other  A,-,  and  the 
set  of  subintervals  {A,}^_1  is  a  partition  of  the  unit  interval  but  not  necessarily  a  Markov 
partition  of  /. 

We  can  associate  sequences  of  IV  +  1  affine  parameter  pairs  and  affine  segment  domains 
with  each  ( N  +  l)-point  orbit  segment  or  equivalently  with  each  initial  condition.  In  par- 
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ticular,  for  the  initial  condition  x,  we  associate  the  sequences  {(r(z,x), (3(i, x))}t^_0  and 
(A(z,x)}£L0  where  r(z,x)  =  tj,  /3(z',x)  =  (3j,  and  A(z,x)  =  Aj  if  /'(x)  €  Aj.  One  can 
show  that  because  each  A,-  is  a  subinterval  and  disjoint  from  all  the  other  A,-,  the  set  of 
initial  conditions  with  the  same  associated  sequence  of  segment  domains  {A(z,  x)};^0  is  a 
subinterval  A(x),  which  is  given  by 


aw  =  nrw,»)) 

t=0 


(7.36) 


where  f~'  denotes  the  possibly  multiple- valued,  inverse  image  of  the  composed  map  fl. 

It  is  straightforward  but  tedious  to  show  that  because  /  is  piecewise  linear,  /*  is  piecewise 
linear  as  well,  so  that 


f\x )  =  T(i, 0, x)  x  +  B(i, 0, x)  i  =  0,  ■  ■  -,N 


(7.37) 


where 


T(0,0,x)  =  1 

J5(0,0,x)  =  0 


(7.38) 

(7.39) 


and  for  i  >  0 


T(i,  0,x)  = 


B(i,  0,x)  = 


n  T(k,x) 

k= 0 

/?(*-  i,x)+ yi  0(k’x)  n  r(/>x) 

fc=0  /=A:+1 


(7.40) 


(7.41) 


In  addition,  for  a  given  orbit  segment  {x(z)  =  /'(x)}jl0,  the  following  holds 


/(Vo )j)(*(j))  =  T(i,j,x)x(j)  +  B(i,j,x)  i,j  =  0  •  •  -,N 


(7.42) 


where 


T(j,j,x)  =  1 
)  =  0 
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(7.43) 

(7.44) 


while  for  i  >  j 


i-j-l 

T(i,j,x)  =  [r(k  +  j,x))  (7.45) 

k= o 

i—j— 2  i-j-l 

B(i,j,x)  =  (3(i-l,x)+  P(k  +  j’x)  A  Til  +  j,x)  •  (7-46) 

fc=0  i=fc+l 


and  for  i  <  j 


T(i,j,x )  =  — r~n -  (7.47) 

m~J0  r(i  +  k,x) 

(  j-i-2  r  j-i-1  1  'J 

B(i,j,x )  =  -T{i,j,x)x  lp(j-  l,x)+  J2  /3(i  +  k,x)  r(i  +  l,x)  >(7.48) 

l  k=o  l=k+ 1  J 


Therefore,  we  can  express  (7.34)  as 


log  P(Y-Mi)) = cm  -  £ 


(7.49) 


which  is  a  quadratic  function  of  the  unknown  orbit  point  x(j). 


7.6.2  ML  State  Estimation  Considerations 


We  first  consider  maximizing  (7.49)  for  the  special  case  in  which  j  =  0,  so  that  x(j)  = 
a:(0)  =  x  and  (7.49)  becomes 


log  p(Y ;  x)  =  log  p(Y ;  i(0))  = 


^  ^(y(i)-T(i,0,x)x- B(i,0,x))2 

°  ^  2cr2 

t=0 


(7.50) 


If  the  sequence  of  parameter  pairs  {{T(i,  0,  x),  B(i ,  0,  x))}^.0  is  known  and  independent 
of  x,  then  finding  the  value(s)  of  x  for  which  (7.50)  has  extremal  values  for  a  given  ob¬ 
servation  set  is  a  straightforward  calculus  problem  having  a  closed-form  solution.  In  this 
special  case,  (7.50)  has  a  unique  extremum  (except  possibly  for  degenerate  cases)  because 
it  is  quadratic  in  x.  Furthermore,  this  extremum  is  a  maximum  because  (7.49)  becomes 
arbitrarily  small  as  x  becomes  arbitrarily  large  or  small. 

Now  consider  the  case  in  which  the  affine  parameter  pairs  {(T(i,  0,  x),-B(i,0,a:))}£Lo  are 
those  associated  with  xml,  the  ML  estimate  of  the  initial  condition  x.  In  other  words, 
as  with  any  initial  condition,  associated  with  xml  is  a  an  orbit  segment  {/‘(xmlVs^Lo,  a 


167 


sequence  of  affine  parameter  pairs  {(T(i,xML),P(i,XML))}iL o,  and  a  sequence  of  affine 
segment  domains  {A{i,  xml)}$Lo-  In  addition,  there  is  a  sequence  of  parameter  pairs 
{(T(i,  0,  xml),  B{i ,  0,  xml)))^  uniquely  determined  from  {(r(t,  xMl),  P(i,  XML))}iL0 ■  Sub¬ 
stituting  the  parameter  pairs  {(T(i,  0,  xMl ),  B(i,  0,  xML))}iLo in  (7.50)  and  maximizing  over 
x  yields  the  following  expression  for  the  maximizing  value  xmax- 


xmax  = 


HiLo  T(h  0,  xML){y{i)  ~  B(i,  0,  xMl)) 
HiLo  T2(i,0,XML ) 


(7.51) 


We  now  show  that  xmax  equals  xml  if  it  is  in  A(xml),  the  subinterval  of  initial  condi¬ 
tions  with  the  associated  sequence  of  affine  segment  domains  {A(z,  xml)}l=q-  If  not,  xml  is 
the  endpoint  of  A(£ml)  for  which  (7.50)  has  the  larger  value.  The  validity  of  this,  assertion 
follows  from  the  fact  that  by  definition  xmax  is  the  value  of  x  that  maximizes  (7.50)  for 
the  fixed  sequence  of  parameter  pairs  {(T(i,  0,  XML),B(i,0,XML))}iLo-  Also  by  definition, 
xml  is  the  value  of  x  that  maximizes  (7.50)  when  the  expression  is  evaluated  with  the  cor¬ 
rect  parameter  pairs  {(T(?,  0,x),I?(i,0,a:))}£(.o  associated  with  x.  Since  (7.50)  has  a  single 
maximum  for  a  given  sequence  of  parameter  pairs  and  {(T(i,0,XML),B(i,0,XML))}iLo  is 
the  sequence  associated  with  xml,  it  follows  that  xmax  equals  xml  when  the  former  is  in 
the  subinterval  of  initial  conditions  associated  with  {(T(i,  0,  xml),  B(i,0,XML))}iLo-  How¬ 
ever,  it  is  possible  that  xmax  lies  outside  this  subinterval.  In  this  case,  because  (7.50)  is  a 
quadratic  function  of  x  (for  fixed  sequence  of  parameter  pairs)  and  because  xml  €  A(xml), 
it  follows  that  xml  is  the  endpoint  of  A(xml)  closer  to  xmax  or  equivalently  the  endpoint 
for  which  (7.50)  has  the  larger  value. 

What  we  have  shown  thus  far  is  that  if  {(r(i,  &ml),  xa/l))}£Lo>  the  sequence  of  affine 

parameter  pairs  associated  with  the  ML  orbit  segment,  is  known  one  can  in  theory  determine 
the  ML  orbit  segment  by  first  evaluating  7.51  to  obtain  xmax •  If  xmax  €  A(xml),  then 
xml ,  the  ML  estimate  of  ar(0),  equals  xmax  ;  if  not,  then  xml  equals  the  endpoint  of  A(xml) 
closer  to  xmax  ■  Having  obtained  xml,  one  can  obtain  xml{}),  the  ML  estimate  of  the  ith 
point  of  the  orbit  segment,  by  using  the  relation  xj ul(0  =  /\xml)-  However,  in  practice  it 
is  often  difficult  to  determine  A(xml)-  An  additional  practical  concern  is  that  the  theoretical 
relation  xml( i)  =  P(xml)  is  generally  not  a  useful  relation  with  EMC  maps,  since  these 
maps  are  expanding  and  thus  inevitable  computer  round-off  error  in  the  determination  of 
xml  is  amplified  by  successive  compositions  of  the  maps  with  themselves. 
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We  now  consider  an  alternative  approach  for  calculating  the  ML  orbit  segment.  The 
approach  is  equivalent  in  theory  to  the  approach  just  discussed,  but  it  is  more  useful  in 
practice.  It  involves  first  obtaining  &ml{N),  the  ML  estimate  of  the  final  point  of  the 
orbit  segment,  instead  of  xml,  the  ML  estimate  of  the  first  point  of  the  orbit  segment. 
The  approach  can  be  used  with  any  EMC  map  for  which  the  image  of  each  affine  segment 
domain  under  /  is  a  union  of  affine  segment  domains  and  the  slope  of  each  affine  segment 
is  an  integer  with  absolute  value  greater  than  one. 

Because  /  is  deterministic  and  xmax  is  the  value  of  x(0)  that  maximizes  (7.49)  for  j  =  0 
when  T(i,0,x)  =  T(i,  0,xml),  and  B(i,0,x)  =  B(i,  0,  xml )  for  each  i  =  0,  •  •  •,  IV,  it  follows 
that  fN {xmax)  is  the  value  of  x{N)  that  maximizes  (7.49)  for  j  —  N  when  T{i,N,x)  = 
T(i,N,XML )  and  B(i,N,x )  =  B(i,N ,xml)  for  each  i.  However,  direct  maximization  of 
(7.49)  for  the  fixed  sequence  of  affine  parameter  pairs  {T(i,  N,  xml)-,  B(i,N,  xml)}  yields 
the  following  for  the  maximizing  value  xmax{N): 


*max{N)  = 


Silo  T{i ,  N,  XML){y(i)  ~  B(i,  N,  xml)) 

ZtLoT2(i,N,XML) 


(7.52) 


Since  Jn{xmax)  and  xmax{N)  both  maximize  logp(T;  x(N))  which  is  a  quadratic  ex¬ 
pression  in  x(N)  with  a  single  extremum,  it  follows  that  xmax{N)  —  fN {x max)  and 
thus  xmax  =  f[£^LyN){xMAx{N))  where  denotes  the  invertible  mapping  implic¬ 

itly  defined  by  the  sequence  of  affine  parameter  pairs  {t(j,xml),/?(*,£ml)}£Lo-  Thus, 
one  can  determine  xmax  by  first  evaluating  (7.52)  and  then  using  the  relation  xmax  = 

f(iML,N)(XMAx(N))- 

The  question  arises  as  to  the  relevance  and  value  of  this  alternative  method  for  calcu¬ 
lating  xmax ■  There  is  a  twofold  answer.  First,  given  any  MC  map  /,  not  necessarily  an 
EMC  map,  for  which  the  image  of  each  affine  segment  domain  under  /  is  a  union  of  affine 
segment  domains  and  given  any  two  segment  domains  Aj  and  Aj  satisfying  A,-n/-1(Aj)  ^  0, 
then  /_1(x)  n  A*  /  0  for  each  x  G  Aj.  In  other  words,  since  /(Aj)  is  a  union  of  affine  seg¬ 
ment  domains,  it  follows  that  if  Aj  is  in  that  union  of  segment  domains  and  x  €  Aj,  then 
at  least  one  point  in  the  possibly  multiple  point  set  /-1(x)  lies  in  Aj. 

It  further  follows  by  induction  that  if  /  has  this  property  of  affine  segment  domains 
mapping  onto  unions  of  affine  segment  domains  and  y  6  A(N,&ml)-,  where  A(N,xml)  is 
the  affine  segment  domain  containing  Jn{xml),  then  f~N(y)  D  A(0,xml)  #  0  and  more 
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importantly  f~N(y )  n  A(xml)  7^  0  as  well  as  f^fL  ^(y)  €  A(xml)-  What  this  means  is 
that  an  equivalent  condition  for  checking  if  xmax  €  A(xml)  is  to  check  if  xmax(N)  € 
A{N,xml)i  a  far  simpler  task  since  A(N,xml )  is  an  affine  segment  domain  and  these 
domains  are  known  a  priori. 

As  a  result,  if  xmax(N)  €  A(N,xml),  then  xmax  =  f(iMLtN)(xMAx(N))  €  A(xml) 
and  thus  xMt  =  *MAX  =  f^LtN)(^MAx(N)).  If  xMax(N )  £  A(N,xMl ),  then  by  a  simi¬ 
lar  argument  as  used  earlier  &ml(N)  =  /N(xml )  is  given  by  the  endpoint  z  of  A(N,xml ) 
for  which  log p(Y;x(N))  has  the  larger  value  and  xml  = 

The  second  part  of  the  twofold  answer  to  the  question  as  to  the  value  of  using  xmax(N) 
to  determine  xmax  involves  the  constraint  that  /  be  an  EMC  map  for  which  the  slope  of 
each  affine  segment  is  an  integer  with  absolute  value  greater  than  one.  As  discussed  in 
Section  6.7,  such  an  EMC  map  has  a  recoverability  or  invertibility  property  in  the  sense 
that  the  entire  orbit  segment  is  recoverable  from  the  final  orbit  point  in  both  a  practical 
and  theoretical  sense.  Therefore,  given  such  an  EMC  map,  we  can  recover  xml  and  the 
entire  ML  orbit  segment  from  xml(N)  in  theory  and  practice  if  the  inverse 

system  {f(zML,N)}i^ o  known  by  using  the  equality 

XML(i)  =  f(£ML,N)(£ML(N)).  (7.53) 

The  above  equality  is  theoretically  equivalent  to  the  equality 

xml(0  =  /‘(xml)-  (7-54) 

However,  the  former  equality  is  more  useful  in  practice  because  the  composed  inverse  system 
/(“'( o)  jvj  is  contracting  and  consequently  does  not  amplify  any  computer  round-off  error  as 
is  the  case  with  the  composed  system  p. 

At  first  glance,  it  appears  that  (7.51)  is  of  little  value  in  finding  xml  and  (7.52)  is 
of  little  value  in  finding  xml(N)  since  both  expressions  use  the  set  of  affine  parameter 
pairs  {(r(i,XML),P(i-,XML))}iLo>  knowledge  of  which  apparently  requires  knowledge  of  the 
unknown  ML  estimate  xml-  In  [65,  66,  67],  it  was  shown  that  for  a  special  class  of  maps 
known  as  generalized  tent  maps,  the  affine  parameter  pairs  {(r(i,  xml}-,  P(h  xml))}  and  ob¬ 
servations  are  causally  related.  For  maps  in  this  special  class,  one  can  determine  each  affine 
parameter  pair  (r(i,XML),0(hXML,))  from  the  subset  of  observations  {j/(j)})_0,  thereby 
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allowing  recursive,  computationally  efficient  ML  orbit  point  estimation  and  simultaneous 
determination  of  the  affine  parameter  pairs.  The  method  in  [65]  does  not  appear  to  be 
adaptable  to  the  more  general  class  of  MC  maps,  since  the  affine  parameter  pairs  and  ob¬ 
servations  are  not  causally  related  for  these  maps  in  general. 

One  method  to  obtain  the  sequence  of  affine  parameters  associated  with  xml  is  the  com¬ 
putationally  intensive,  brute-force  method  of  evaluating  (7.51)  for  each  possible  sequence 
of  affine  parameter  pairs,  determining  the  most  likely  orbit  segment  associated  with  each 
sequence,  and  choosing  the  sequence  of  parameter  pairs  for  which  the  associated  most  likely 
orbit  segment  yields  the  largest  value  of  the  likelihood  function.  However,  the  orbit  segment 
obtained  with  this  method  which  yields  the  largest  value  of  the  likelihood  function  is  in  fact 
the  ML  orbit  segment.  What  we  seek  is  a  computationally  simpler  estimator  for  the  ML 
orbit  segment.  In  the  next  section,  we  introduce  such  an  estimator;  the  estimator  exploits 
the  relation  between  MC  maps  and  HMMs  discussed  earlier  in  the  chapter.  The  estimator 
is  an  optimal  ML  estimator  if  the  HMM  it  uses  is  chosen  appropriately.  Otherwise,  the 
estimator  is  a  suboptimal  ML  estimator,  but  one  that  is  potentially  effective  nevertheless. 

We  conclude  this  section  with  a  subtle,  theoretical  issue  involving  ML  state  estimation 
and  MC  maps.  An  implicit  assumption  in  the  discussion  thus  far  has  been  the  existence 
of  xml ■  In  fact,  if  /  is  not  continuous,  x ml  may  not  exist  except  in  a  limiting  sense.  In 
particular,  consider  the  situation  in  which  xmax  is  not  in  A(xml),  and  thus  the  likelihood 
function  attains  its  maximum  among  points  in  A(xml)  at  an  endpoint  xe  of  A(xml)- 
However,  xe  may  not  belong  to  A(xml)  if  /  is  discontinuous  at  one  of  the  points  of  the 
orbit  segment  {p(xe)}fL0.  In  other  words,  each  endpoint  of  an  affine  segment  domain  either 
belongs  to  that  segment  domain  or  to  the  domain  of  the  adjacent  segment,  with  the  endpoint 
having  two  possible  images  under  /  depending  upon  which  domain  the  point  belongs  to.  The 
implication  is  that  xe  can  be  the  ML  estimate  of  x  only  if  it  is  in  A(xml)-  If  xe  A(xml)i 
then  xml  does  not  exist  since  the  likelihood  function  is  increasing  on  any  infinite  sequence 
of  points  in  A(£ml)  converging  to  xe.  Whereas  the  number  of  discontinuities  of  /  is  finite, 
the  probability  of  it  occurring  is  negligible.  Consequently,  in  the  discussion  that  follows  we 
assume  that  xml  exists  and  that  it  is  unique,  although  the  results  are  readily  generalized 
to  the  case  of  multiple  values  of  x  maximizing  the  likelihood  function. 
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7.7  Optimal/Suboptimal  ML  State  Estimator  for  MC  Maps 


7.7.1  Theoretical  Foundation 

We  now  show  that  for  any  MC  map  which  gives  rise  to  arbitrarily  fine  Markov  partitions, 
one  can  in  theory  obtain  the  sequence  of  affine  parameter  pairs  {{t{i,xml)-,  /3(i,  XML))}iLo 
associated  with  xml  without  knowledge  of  xml-  As  shown  in  the  previous  section,  if  the 
ML  sequence  of  affine  parameter  pairs  is  known,  it  is  straightforward  to  determine  xml  and 
the  entire  ML  orbit  segment.  The  theoretical  result  presented  in  this  subsection  provides 
the  foundation  for  a  practical  state  estimator  introduced  in  the  next  subsection.  For  a  given 
set  of  observations,  the  estimator  is  the  optimal  ML  estimator  if  a  certain  HMM  used  by 
the  estimator  is  chosen  appropriately.  The  estimator  may  be  a  suboptimal  ML  estimator 
otherwise. 

The  theoretical  result  presented  in  this  subsection  and  the  estimator  introduced  in  the 
next  both  exploit  the  close  relation  between  the  likelihoods  of  orbit  segments  generated  by 
MC  maps  for  a  given  set  of  noisy  observations  and  the  likelihoods  of  state  sequences  of  asso¬ 
ciated  HMMs.  To  offer  some  insight  into  this  relation,  we  consider  an  MC  map  /  along  with 
a  Markov  partition  {Ij}J=0  and  its  corresponding  Markov  chain,  with  Sj  denoting  the  state 
associated  with  Ij.  Just  as  we  can  associate  a  sequence  of  affine  parameter  pairs  and  affine 
segment  domains  with  each  orbit  segment  or  initial  condition,  we  can  associate  a  sequence  of 
affine  parameter  pairs  {(r(i,5),/?(i,5))}A0  and  affine  segment  domains  {A(i,S)}£L0  with 
each  state  sequence  S  =  {S(i)}fL0  of  the  Markov  chain.  In  particular,  if  S(i)  =  Sj  and 
Ij  C  At,  we  let  (r(i,  S),0(i,  S))  =  fa,  0k)  and  A(i )  =  Ak .  That  is,  for  each  i  we  associate 
the  affine  parameter  pair  and  segment  domain  for  the  affine  segment  of  /  whose  domain 
contains  the  partition  element  associated  with  S(i). 

We  let  {pij}ftj=i  denote  the  state  transition  probabilities  of  the  Markov  chain  corre¬ 
sponding  to  the  given  Markov  partition;  we  let  p(y(i)\Sj)  denote  the  output  PDF  associated 
with  the  jth  state  for  either  of  the  HMM  models  (quantized  or  unquantized  outputs)  intro¬ 
duced  earlier;  and  we  let  tt (Sj)  denote  the  initial  state  probability  of  the  jth  state.  Then, 
for  a  given  state  sequence  S  =  {S(i)}iL0  and  a  given  set  of  observations  Y  =  {y(i)}^L0,  the 
joint  PDF  of  the  state  sequence  and  observations  p(S,  Y )  is  given  by 

N 

p(S,  Y)  =  7r(5(0))p(2/(0)|5(0))  (7-55) 

«=i 
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The  Viterbi  algorithm  [74,  75]  is  a  computationally  efficient  algorithm  for  finding  the  state 
sequence  which  maximizes  p(S,Y),  and  this  state  sequence  also  maximizes  p{S\Y),  i.e.,  it  is 
the  most  probable  state  sequence  for  the  given  observation  set.  We  also  have  the  following 
expression,  which  we  use  later  for  P(S)  =  P(5(0),  •  •  •,  S(N)),  the  probability  of  the  state 
sequence  S: 


N 


P(S )  =  *"(5(0))  n  Ps(i-I),s(i)- 


(7.56) 


t=i 


We  now  consider  the  set  of  state  transition  pseudo-probabilities  where  qtJ  =  1 

if  pij  >  0  and  qij  =  0  if  pxj  =  0;  and  we  consider  the  set  of  initial  state  pseudo- probabilities 
{v{Sj)}iL l5  where  rj(Sj)  =  1  for  each  j.  With  these  state  transition  pseudo-probabilities 
and  initial  state  pseudo-probabilities,  we  define  the  joint  pseudo-PDF  q(S,Y)  as 


N 


q{S,Y )  =  f7(S(0))  p(y(0)|S(0))  9s(i-i),s(i)  p(3/(*)i^(*)) 


t=i 


=  < 


UloP(y(i)\S(i))  if  P(S)  >  0 
0  if  P{S )  =  0 


> . 


(7.57) 

(7.58) 


As  with  (7.55),  we  can  use  the  Viterbi  algorithm  to  efficiently  determine  the  state  sequence 
which  maximizes  q(S,Y)  for  a  given  observation  set  Y. 

We  now  use  these  state  transition  pseudo-probabilities  and  initial  state  pseudo-probabil¬ 
ities  with  the  HMM  model  for  quantized  orbit  points  introduced  in  Section  7.2.2.  As  in  that 
section,  we  let  Hj  denote  the  value  associated  with  Sj,  but  we  now  require  that  Hj  €  Ij. 
In  other  words,  we  require  that  the  constant,  quantized  value  associated  with  each  state  be 
a  point  in  the  partition  element  corresponding  to  that  state.  For  a  given  state  sequence  5, 
we  let  H{i )  denote  the  quantized  value  associated  with  S{i),  i.e.,  H(i)  =  Hj  if  S(i)  =  Sj. 
For  this  model  and  the  earlier  assumption  that  the  observation  noise  sequence  {t>(i)}£L0  is 
a  Gaussian,  white-noise  sequence  with  variance  a2,  the  following  is  true: 


iogp(j(i)|S(*'))  =  C(0)  -  M±L.4p£- 


(7.59) 
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where  C(0)  is  a  constant.  Therefore,  if  P(S)  >  0, 

log ?(5,  Y)  =  C(N)  -  £  ^(*)~Tf(,))2  (7.60) 

2—0 

which  is  the  same  expression  as  log p(Y;x(j)  given  by  (7.34)  if  we  associate  the  sequence 
of  state  values  { H(i)}iL0  with  the  orbit  segment  {/‘(x(0))};)Lo.  Thus,  if  the  sequence 
{.ff(i)}£L0  is  treated  as  an  orbit  segment,  then  the  ML  state  sequence  Sml  —  {*5Wl(0}£Lo 
which  we  define  as  the  state  sequence  that  maximizes  q(S,Y)  or  equivalently  9(5|F),  is 
also  the  state  sequence  with  nonzero  probability,  i.e.,  P(Sml)  >  0  which  maximizes  the 
log-likelihood  function  given  by  (7.34). 

In  light  of  the  close  relation  between  (7.34)  and  (7.60),  one  might  expect  there  to  be  a  re¬ 
lation  between  the  most  likely  state  sequence  Sml  and  the  ML  orbit  segment  {/*( xml )}£L0- 
It  is  plausible  that  for  a  given  MC  map,  one  can  find  a  sufficiently  fine  Markov  partition 
such  that  if  {/mx(0}£Lo  denotes  the  sequence  of  partition  elements  associated  with  Sml > 
then  P(xml)  €  Iml( *')  for  each  i.  However,  it  is  unclear  how  one  might  prove  this  rela¬ 
tion.  In  Appendix  B,  we  establish  the  weaker  relation  that  for  a  given  MC  map,  a  given 
set  of  observations,  and  a  sufficiently  fine  Markov  partition,  the  sequence  of  affine  param¬ 
eter  pairs  associated  with  the  most  likely  state  sequence  Sml  is  the  same  sequence  as 
{(r(i,  xml),  S(i,  xml))},  the  sequence  of  affine  parameter  pairs  associated  with  xml ■  The 
value  of  this  relation  is  that  for  a  given  MC  map  and  set  of  observations,  if  we  first  select 
a  Markov  partition  for  which  this  equality  in  sequences  of  affine  parameter  pairs  holds  and 
then  determine  Sml  and  its  associated  sequence  of  affine  parameter  pairs,  we  can  in  theory 
evaluate  xml  and  the  entire  ML  orbit  segment.  In  light  of  the  discussion  in  the  previous 
subsection,  we  can  also  obtain  the  ML  orbit  segment  in  practice  if  the  map  is  an  EMC  map 
with  affine  segment  slopes  having  absolute  value  greater  than  one  and  for  which  the  image 
of  each  affine  segment  domain  is  a  union  of  affine  segment  domains.  In  summary  and  more 
formally,  we  have  the  following  result: 

Proposition  4:  For  any  MC  map  /  which  gives  rise  to  arbitrarily  fine  Markov  partitions 
and  for  a  given,  finite  set  of  observations  Y  =  {t/(0}£Lo>  one  can  determine  the  ML  sequence 
of  affine  parameter  pairs  {(T(i,&ML),P{i,&ML))}  without  knowledge  of  xml ,  and  as  a 
consequence  one  can  in  theory  determine  the  ML  orbit  segment  {xML(i)}iLo  without  testing 
every  sequence  of  affine  parameter  pairs. 
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Proof:  (see  Appendix  B) 

Although  choosing  a  Markov  partition  satisfying  the  conditions  of  the  proof  guarantees 
that  the  ML  state  sequence  yields  the  sequence  of  affine  parameter  pairs  corresponding  to 
the  ML  orbit  segment,  often  a  partition  with  fewer  elements  works  as  well.  A  practical 
iterative  method  for  choosing  a  sufficiently  fine  Markov  partition  is  to  start  with  a  coarse 
partition,  determine  the  sequence  of  affine  parameter  pairs  corresponding  to  the  ML  state 
sequence,  and  then  iteratively  select  refinements  of  the  partition  until  the  corresponding 
sequences  of  affine  parameter  pairs  for  the  ML  state  sequences  remain  the  same.  In  addition, 
as  suggested  by  the  proof  of  the  theorem,  as  the  number  of  observations  N  increases, 
increasingly  finer  Markov  partitions  may  be  needed.  For  larger  values  of  A,  an  alternative, 
possibly  suboptimal  ML  estimator  involves  using  a  Markov  partition  which  may  not  be 
sufficiently  fine  to  ensure  that  the  sequence  of  affine  parameter  pairs  associated  with  the 
ML  state  sequence  of  the  corresponding  Markov  partition  is  the  same  sequence  as  that 
associated  with  the  ML  orbit  segment. 

7.7.2  The  Estimation  Algorithm 

As  noted  earlier,  the  solution  of  (7.51)  yields  x ml  only  if  it  lies  in  A(xml ),  and  it  is  often 
impractical  to  determine  A(xml)  especially  for  larger  values  of  N.  Also  noted  earlier  was 
the  fact  that  the  relation  xml(0  =  Is  often  of  little  practical  value  with  EMC  maps 

because  of  the  expansive  nature  of  these  maps  and  computer  round-off  error  in  determining 
xml ■  However,  we  can  circumvent  both  practical  problems  with  EMC  maps  for  which  the 
affine  segment  slopes  having  absolute  values  greater  than  one  and  for  which  each  affine 
segment  domain  maps  onto  a  union  of  affine  segment  domains.  For  these  maps,  we  can  first 
find  xml(N),  generally  a  far  simpler  task,  and  from  this  calculate  the  ML  orbit  segment  as 
{f(£ML,N)(2ML(N)}iL0.  The  two  EMC  maps  used  for  the  examples  in  the  next  subsection 
both  satisfy  these  constraints. 

Fusing  these  practical  considerations  with  the  theoretical  result  presented  in  the  previous 
subsection  leads  to  the  following  following  practical  algorithm  for  estimating  the  ML  orbit 
segment  based  on  a  set  of  observations  Y  for  any  EMC  map  /  which  gives  rise  to  arbitrarily 
fine  Markov  partitions  and  for  which  the  affine  segment  slopes  have  absolute  values  greater 
than  one  and  for  which  each  affine  segment  domain  maps  onto  a  union  of  affine  segment 
domains. 
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Optimal/Suboptimal  ML  State  Estimator 

1.  Given  an  (N  +  l)-point  observation  set  Y(0,N)  and  a  Markov  partition  for  /,  find 
the  ML  state  sequence  for  the  quantized-output  HMM  model  associated  with  this 
partition  in  which  the  quantized  output  value  for  each  state  is  given  by  any  point 
in  the  corresponding  partition  element  and  which  uses  the  state  transition  pseudo¬ 
probabilities  and  initial  state  pseudo-probabilities  defined  in  the  previous  subsection. 
Let  Sml  denote  this  state  sequence. 

2.  Given  the  ML  state  sequence,  find  the  associated  sequences  of  affine  parameter  pairs 
{r(i,  Sml),  SMLjiLo  and  affine  segment  domains  {A(z,  Sml)}£Lo-  ^  the  Markov 
partition  is  sufficiently  fine  so  as  to  satisfy  the  sufficient  condition  specified  in  the 
proof  of  Proposition  4,  the  sequences  will  be  the  same  as  those  associated  with  the 
ML  orbit  segment.  If  not,  the  sequences  may  not  be  the  same  as  those  associated 
with  the  ML  orbit  segment,  and  the  algorithm  may  not  be  optimal. 

3.  Evaluate  (7.52)  for  xmax(N)  using  the  sequence  of  affine  parameter  pairs  associated 
with  the  ML  state  sequence. 

4.  If  xmax(N)  €  A(N,  Sml),  set  xMl(N)  equal  to  xMax(N).  If  not,  set  xMl(N)  equal 
to  the  endpoint  of  A(N,Sml)  for  which  logp(E;x(lV))  has  the  larger  value. 

5.  Let  {f(sML,N)}i^o  denote  the  function  implicitly  defined  by  {r(i,  Sml),  /3(z,  Sml},Lq 
(analogous  to  the  function  {f^ML  iV)}i=o  defined  earlier).  In  particular,  we  have  the 
following  for  i  =  1  and  i  =  2: 


=  f{sML,NMN)) 

x(N  —  2)  =  f[s2ML,N)(*m 


x(N)  —  P(N  —  1,  Sml) 
t(N  -  1,Sml) 
x(N  —  1)  —  —  2,  Sml) 

t(N  -  2,  Sml) 


(7.61) 

(7.62) 


6.  Calculate  the  reverse  orbit  segment  {xml{N  -  i)  =  f(sML,N)^ML^)))'  This  orbit 
segment  is  the  ML  orbit  segment  if  the  Markov  partition  is  sufficiently  fine. 


7.7.3  Estimation  Examples 

Figures  7-17  (a)  and  (b)  depict  the  performance  results  obtained  by  applying  the  ML  esti¬ 
mator  to  the  maps  shown  in  Figures  7-16  (a)  and  (b),  respectively.  Both  maps  were  used  for 


examples  earlier  in  the  chapter.  Each  plotted  SNR  gain  reflects  the  average  improvement 


Figure  7-16:  Two  EMC  maps,  (a)  g\\  (b)  52- 


Figure  7-17:  Performance  results  for  estimating  100-point  orbit  segment,  (a)  g\\  (b)  52- 

in  SNR  obtained  by  estimating  each  point  of  a  100-point  orbit  segment.  More  precisely,  for 
each  input  SNR,  1000  independent  observation  sets  were  generated  with  the  same  100-point 
orbit  segment.  The  estimator  was  applied  to  each  of  the  observation  sets,  and  the  average 
SNR  improvement  in  estimating  the  100  orbit  segment  points  was  calculated  for  each  ob¬ 
servation  set.  The  1000  values  of  average  SNR  improvement  were  then  averaged  and  used 
in  the  figure  as  the  plotted  SNR  gain  for  the  corresponding  input  SNR. 

The  curves  are  parameterized  by  the  number  of  states  in  the  HMM  model  used  to 
estimate  the  sequence  of  affine  parameters  associated  with  the  ML  orbit  segment.  Also 
shown  in  the  figures  is  the  upper  bound  on  the  SNR  gain  provided  by  the  Cramer-Rao 
bound  for  unbiased  estimators.  Using  the  more  general  results  in  [76],  one  can  show  that 
the  Cramer-Rao  bound  for  fixed-interval  smoothing  with  one-dimensional  maps  is  given 
by  a2 IN  where  N  is  the  number  of  orbit  segment  points  and  a2  is  the  observation  noise 
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variance. 


The  figures  suggest  that  the  ML  estimator  is  superoptimal  in  the  sense  that  the  SNR 
gain  exceeds  the  Cramer- Rao  bound  at  some  input  SNRs.  However,  such  is  not  the  case  as 
the  ML  estimator  is  biased  for  these  maps,  and  thus  the  Cramer-Rao  bound  for  unbiased 
estimators  is  only  applicable  to  the  estimator  in  an  asymptotic  sense  as  the  input  SNR 
tends  to  infinity.  Although  deriving  an  analytic  expression  for  the  bias  appears  to  be  an 
intractable  problem,  we  can  examine  its  behavior  in  simulations.  Figures  7-18  (a)  and  (b) 
depict  the  absolute  value  of  the  bias  at  each  orbit  point  for  the  results  shown  in  Figures 
7-17  (a)  and  (b),  respectively.  The  larger  bias  values  at  the  end  of  the  orbit  segment  is 


Figure  7-18:  Estimator  bias  for  results  in  Figure  7-17.  (a)  g\:  (b)  gi- 


understandable  in  light  of  the  fact  that  the  squared  estimation  error  is  largest  at  the  end 
of  the  segment.  The  performance  results  for  both  maps  exhibit  a  threshold  effect  with 
significant  SNR  gain  for  input  SNRs  above  a  threshold  and  mediocre  or  negligible  SNR 
gain  for  input  SNRs  below  the  threshold.  Such  an  effect  is  typical  of  ML  estimators  for 
nonlinear  estimation  problems,  with  the  threshold  indicating  the  input  SNR  value  below 
which  nonlinearities  strongly  influence  the  likelihood  function.  Large  SNR  gains  at  larger 
input  SNRs  is  not  peculiar  to  the  chosen  orbit  segment  as  suggested  by  Figures  7-19  (a)  and 
(b),  which  depict  the  average  of  the  performance  results  for  100  randomly  chosen  100-point 
orbit  segments. 

Because  both  maps  are  noninvertible,  one- dimensional,  and  exhibit  sensitive  dependence 
on  initial  conditions,  estimation  accuracy  improves  as  the  number  of  observations  at  future 
times  increases  (as  indicated  by  the  performance  bound  analysis  in  Chapter  5).  As  a  result 
and  as  noted  above,  at  higher  input  SNRs  the  estimation  errors  associated  with  the  final 
points  of  an  orbit  segment  dominate  the  average  of  squared  estimation  errors  for  all  orbit 
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Figure  7-19:  Average  of  performance  results  for  estimating  100  randomly  chosen  100-point 
orbit  segments,  (a)  51:  (b)  g2. 


points.  Figures  7-20  (a)  and  (b)  depict  the  performance  results  for  the  same  conditions  as 
used  for  the  results  in  Figures  7-17  (a)  and  (b),  but  with  the  estimation  errors  for  the  last  30 
points  of  the  estimated  100-point  orbit  segment,  omitted  from  the  SNR  gain  calculation.  A 
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Figure  7-20:  Performance  results  for  estimating  100-point  orbit  segments  with  estimates  of 
last  30  points  not  used  in  gain  calculation,  (a)  g\\  (b)  g2. 


comparison  of  corresponding  pairs  of  figures  reveals  the  dominant  role  that  the  estimation 
error  for  these  omitted  points  has  on  the  average  squared  estimation  error. 

Finally,  the  question  arises  as  to  the  value  of  using  an  HMM  model  to  estimate  the 
sequence  of  affine  parameter  pairs  associated  with  the  ML  orbit  segment.  Figures  7-21  (a) 
and  (b)  depict  the  performance  results  obtained  by  using  the  sequence  of  affine  parameter 
pairs  associated  with  the  observation  set  Y .  That  is,  for  each  i  the  parameter  pair  ( Tj,f3j )  for 
which  y(i)  €  Aj  was  used  for  (r(i,  xml),  /?(*,  xml))-  The  poor  performance  results  suggest 
the  necessity  and  value  of  using  an  HMM  to  estimate  the  sequence  of  affine  parameter  pairs 
associated  with  the  ML  orbit  segment. 
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Figure  7-21:  Performance  results  for  estimating  100-point  orbit  segment  with  sequence  of 
affine  parameter  pairs  associated  with  the  observation  set  used  as  the  estimated  sequence 
associated  with  the  ML  orbit  segment,  (a)  f\:  (b)  fa . 


Chapter  8 


Conclusions 

8.1  Summary  and  Contributions 

This  thesis  has  dealt  with  the  analysis  and  synthesis  of  chaotic  maps  and  time-sampled 
chaotic  flows,  with  a  focus  on  the  problems  and  issues  that  arise  with  noise-corrupted  orbit 
segments  generated  by  these  systems.  Both  dissipative  systems  and  nondissipative  systems 
have  been  considered,  with  both  types  of  systems  considered  in  the  context  of  analysis 
and  the  latter  type  also  considered  in  the  context  of  synthesis.  With  respect  to  dissipa¬ 
tive  systems,  three  suboptimai,  probabilistic  state  estimation  algorithms — an  ML  estimator 
based  on  grid  search,  a  local  MMSE  estimator  based  on  extended  Kalman  smoothing,  and 
a  global  MMSE  estimator  based  on  a  finite-sum  approximation  to  the  conditional  mean 
integral — have  been  introduced  and  their  performance  experimentally  assessed  on  three 
different  problem  scenarios:  known  system  dynamics,  unknown  system  dynamics  but  avail¬ 
ability  of  a  noise-free  reference  orbit,  unknown  system  dynamics  and  no  availability  of  a 
noise-free  reference  orbit.  Both  the  ML  and  local  MMSE  estimators  exploit  the  topological 
transitivity  of  dissipative,  chaotic  systems  when  restricted  to  their  steady-state  attractors, 
and  both  estimators  are  potentially  effective  for  all  three  problem  scenarios.  The  global 
MMSE  estimator  exploits  the  existence  of  an  ergodic  measure  on  a  steady-state  chaotic 
attractor,  which  allows  the  substitution  of  an  infinite  summation  for  the  integral  defining 
the  conditional  mean  which  yields  the  optimal  MMSE  state  estimator.  One  feature  of  the 
global  MMSE  estimator  is  that  it  converges  to  the  optimal  MMSE  state  estimator  as  the 
number  of  terms  in  the  finite  summation  defining  the  estimator  goes  to  infinity. 

The  assumed  determinism  in  the  system  dynamics  facilitates  the  derivation  of  upper 
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bounds — Cramer-Rao,  Barankin,  and  Weiss- Weinstein — on  state  estimator  performance. 
These  bounds  have  been  derived  for  the  state  estimation  problem  of  interest  in  the  thesis, 
and  their  behavior  has  been  experimentally  analyzed  on  two,  dissipative,  chaotic  systems: 
the  Henon  map  and  time-sampled  Lorenz  flow.  The  Cramer-Rao  and  Barankin  bounds 
have  been  shown  to  provide  potentially  useful  information  on  the  influence  of  fundamen¬ 
tal  properties  of  dissipative,  chaotic  diffeomorphisms — positive  Lyapunov  exponents  and 
boundedness  of  attractors — on  achievable  state  estimator  performance,  when  the  unknown 
state  is  nonrandom.  In  contrast,  the  random  Cramer-Rao  and  Weiss- Weinstein  bounds 
have  been  shown  to  be  of  limited  value  in  the  context  of  initial  condition  estimation  with 
dissipative,  chaotic  diffeomorphisms,  when  the  unknown  initial  condition  is  a  random  vector. 

With  respect  to  nondissipative  maps,  the  thesis  has  considered  a  class  of  piecewise  linear 
unit-interval  maps,  members  of  which  give  rise  to  finite-state,  homogeneous  Markov  chains. 
The  thesis  has  presented  known  properties  of  these  maps,  established  additional  properties, 
and  explored  the  potential  value  of  these  maps  as  generators  of  signals  for  practical  appli¬ 
cations.  A  close  relation  between  noise-corrupted  orbit  segments  generated  by  the  maps 
and  hidden  Markov  models  has  been  established,  and  this  relation  has  been  exploited  in 
practical,  optimal  and  suboptimal  algorithms  for  detection,  parameter  estimation,  and  state 
estimation  with  the  maps. 

This  thesis  has  made  two,  principal  contributions  to  the  research  community.  First, 
it  has  established  a  rigorous,  probabilistic  foundation  for  the  problem  of  state  estimation 
with  deterministic,  dissipative,  chaotic  diffeomorphisms.  In  particular,  the  thesis  has  shown 
how  the  existence  of  invariant  measures  on  steady-state  chaotic  attractors  facilitates  prac¬ 
tical  state  estimation  with  dissipative,  chaotic  systems.  In  addition,  the  thesis  has  shown 
the  value  of  the  Cramer-Rao  and  Barankin  bounds  for  assessing  the  influence  of  funda¬ 
mental  properties  of  chaotic  systems  including  the  existence  of  positive  Lyapunov  expo¬ 
nents,  boundedness  of  attractors,  and  system  invertibility  or  noninvertibility  on  theoret¬ 
ically  achievable  state  estimator  performance  with  these  systems.  Finally,  the  thesis  has 
exposed  the  limitations  of  performance  bounds  for  estimators  of  random  parameters  when 
applied  to  dissipative,  chaotic  systems. 

The  second,  principal  contribution  of  the  thesis  involves  its  consideration  of  MC  maps. 
In  particular,  by  assimilating  known  and  establishing  additional  properties  of  these  maps, 
the  thesis  has  identified  a  potentially  useful  and  versatile  source  of  signal  generators.  In 
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addition,  the  detection,  parameter  estimation,  and  state  estimation  algorithms  introduced 
in  the  thesis  for  use  with  these  maps  may  prove  to  be  of  value  in  applications  involving  the 
maps.  These  algorithms  may  also  provide  useful  insight  into  the  design  of  optimal,  effective, 
and  robust  algorithms  for  detection,  parameter  estimation,  and  state  estimation  with  other 
classes  of  chaotic  systems  such  as  dissipative  maps  and  flows. 


8.2  Suggested  Future  Research 

This  thesis  has  either  partially  or  totally  resolved  several  research  problems  involving  chaotic 
systems;  and  as  a  result,  several  new  research  problems  have  emerged.  Consequently,  the 
analysis,  algorithms,  and  experimental  results  presented  in  this  thesis  implicitly  suggest  a 
number  of  interesting,  challenging,  and  potentially  fruitful  topics  for  future  research. 

With  respect  to  state  estimation  with  dissipative  maps,  the  state  estimation  algorithms 
and  performance  bound  derivations  introduced  in  Chapters  4  and  5  may  have  shed  new 
light  on  the  problem  of  state  estimation  with  chaos,  but  optimal,  practical,  probabilistic 
state  estimation  with  chaos  remains  an  elusive  goal.  It  would  be  useful  to  explore  methods 
to  refine  the  approximate  ML  and  MMSE  state  estimators  introduced  in  Chapter  4  so  that 
the  simplifying,  practical,  heuristic  elements  of  the  estimators  are  avoided.  Alternatively, 
it  would  be  useful  to  establish  a  theoretical  justification  for  these  heuristic  elements  and 
formulate  theoretically  motivated  guidelines  for  choosing  values  for  the  various  parameters 
which  these  elements  introduce  into  the  estimators.  In  the  pursuit  of  these  tasks,  it  might  be 
useful  to  replace  the  deterministic  state  equation  used  throughout  the  thesis  by  a  sequence 
of  noise-driven  state  equations  with  the  deterministic  equation  as  the  limit. 

An  additional  topic  for  future  research  arises  from  the  similarities  of  the  global  MMSE 
state  estimator  introduced  in  Chapter  4  with  the  hidden  Markov  modeling  (HMM)  esti¬ 
mator  introduced  in  [63],  which  also  is  a  global  estimator.  Each  of  the  estimators  has 
strengths  and  weaknesses  over  the  other,  and  computer  experiments  suggests  that  each  is  a 
potentially  effective  state  estimator  with  dissipative,  chaotic  maps.  A  useful  research  task 
would  involve  merging  the  two  estimators  in  an  effort  to  create  a  hybrid  state  estimator  that 
retains  the  strengths  and  avoids  the  weaknesses  of  the  individual  estimators.  Useful  insight 
into  both  estimators  might  be  obtained  by  studying  them  in  conjunction  with  the  class 
of  MC  maps  introduced  in  Chapter  6.  In  fact,  as  noted  at  the  beginning  of  that  chapter, 
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our  original  purpose  in  studying  MC  maps  was  to  better  understand  the  theoretical  and 
practical  properties  of  the  HMM  estimator.  A  related  task,  not  considered  in  the  thesis, 
would  involve  extending  the  global  MMSE  estimator  to  the  self-cleaning  problem  scenario, 
that  is,  the  state  estimation  scenario  in  which  the  system  dynamics  are  unknown  and  a 
noise-free  reference  orbit  segment  is  not  available.  Computer  experiments  suggest  that  the 
estimator  in  its  present  form  is  not  useful  for  this  scenario. 

With  respect  to  state  estimator  performance  bounds,  several  questions  remain  unre¬ 
solved  and  several  new  research  problems  have  emerged  thereby  suggesting  a  number  of 
challenging,  potentially  beneficial  research  tasks.  Perhaps  the  most  important  and  useful 
of  these  tasks  would  be  the  development  of  new  state  estimator  performance  bounds  for 
use  with  dissipative,  chaotic  diffeomorphisms  when  the  unknown  state  is  a  random  vec¬ 
tor.  The  importance  of  this  task  stems  from  the  experimental  results  presented  in  Chapter 
5  which  graphically  indicated  that  existing  performance  bounds  for  estimators  of  random 
parameters  have  limited  value  with  dissipative,  chaotic  systems.  In  addition,  many  of  the 
fundamental  conclusions  of  the  chapter  involving  the  influence  of  Lyapunov  exponents,  at¬ 
tractor  boundedness,  and  system  invertibility  on  achievable  state  estimation  performance, 
are  strictly  applicable  only  to  unbiased  state  estimators.  It  appears  that  this  restriction  on 
the  bias  is  an  intrinsic  aspect  of  all  existing  performance  bounds  for  nonrandom-parameter 
estimators  which  is  avoidable  only  by  using  estimator-dependent  bounds  or  by  treating  the 
unknown  parameter  as  random.  In  light  of  this,  it  appears  that  extending  the  fundamental 
conclusions  of  the  chapter  to  arbitrary  state  estimators,  both  unbiased  and  biased,  requires 
consideration  of  performance  bounds  for  estimators  of  random  parameters,  thereby  under¬ 
scoring  the  need  for  new,  effective  performance  bounds  for  the  problem  of  state  estimation 
with  chaotic  systems  when  the  unknown  state  is  random. 

Several  unresolved  or  unexplored  issues  concerning  the  unit-interval  maps  considered  in 
Chapters  6  and  7  remain;  a  number  of  worthwhile,  potentially  fruitful  research  tasks  should 
be  undertaken  to  resolve  these  issues.  For  example,  as  suggested  in  Chapter  6,  MC  maps 
might  be  useful  signal  generators  in  light  of  the  rich  set  of  properties  these  maps  exhibit. 
It  would  be  useful  to  ascertain  the  strengths  and  weaknesses  of  using  MC  maps  and  maps 
derived  from  them  over  other  techniques  for  generating  Markov  chains  and  processes  with 
specified  stationary  PDFs.  It  would  also  be  useful  to  identify  specific,  practical  applications 
in  which  MC  maps  might  be  used,  such  as  secure  communication,  and  to  assess  the  value 
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of  using  MC  maps  for  these  applications.  An  unrelated  research  task  involves  the  detection 
algorithm  introduced  in  Chapter  7  for  discriminating  among  EMC  maps  based  on  noise- 
corrupted,  unquantized  orbit  segments.  Although  the  algorithm  is  suboptimal,  it  appears 
that  the  algorithm  converges  in  an  appropriately  defined  sense  to  the  optimal,  minimum 
probability-of-error  detector,  as  the  Markov  partition  used  by  the  detector  becomes  increas¬ 
ingly  fine.  It  would  be  useful  to  formally  and  conclusively  establish  convergence  properties 
of  the  detector. 

Only  a  relatively  small  class  of  deterministic,  unit  interval  maps  was  considered  in 
Chapters  6  and  7.  However,  this  class  of  maps  is  dense  (in  an  appropriately  defined  sense) 
in  a  much  larger  set  of  unit-interval  maps  [14,  46].  It  would  be  useful  to  determine  if  the 
detection  and  estimation  algorithms  introduced  in  Chapter  7  can  be  extended  to  this  larger 
set  of  maps.  Similarly,  it  would  be  useful  to  determine  if  the  properties  of  MC  maps  are 
shared  by  other  classes  of  deterministic  maps  and  flows,  and  if  so,  if  the  detection  and 
estimation  algorithms  introduced  in  the  thesis  can  be  extended  to  these  other  classes  of 
deterministic  systems  as  well. 
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Appendix  A 


Performance  Bound  Equations 

A.l  Cramer-Rao  Bound 


In  this  section,  we  derive  the  Cramer-Rao  bound  for  the  problem  scenario  of  interest  in 
Chapter  5  when  the  unknown  state  to  be  estimated  is  nonrandom.  The  derivation  of  the 
Cramer-Rao  bound  for  deterministic,  nonlinear  systems  has  appeared  elsewhere  (see  e.g., 
[85]);  we  include  it  here  for  completeness  and  for  a  more  general  problem  scenario,  one 
involving  a  variable  number  of  observations  occurring  at  past  and  future  times,  than  has 
been  considered  in  the  past.  In  particular,  we  derive  the  Fisher  information  matrix  J(xno), 
the  inverse  of  which  provides  the  Cramer-Rao  bound  on  the  error  covariance  matrix  of  any 
unbiased  estimator  for  x(no)  when  sc(no)  =  xno,  where  xno  is  the  actual  value  of  the  state 
at  time  no- 

For  the  DTS/DTO  model  given  by  (5.1)  and  (5.2)  and  with  the  slightly  more  general 
assumption  that  the  observation  noise  sequence  is  white,  but  not  necessarily  Gaussian  with 
the  PDF  of  each  sequence  element  given  by  pv(v),  the  PDF  p(y(i);  x(i))  is  given  by 

p(y(i);  x(i))  =  pv(y(i)  -  &(*(*)))  (A.l) 

and  since  x(i)  =  fl~n°(x(n0))  and  the  noise  is  white,  it  follows  that  the  likelihood  function 
p(Y ;  ®(no))  is  given  by 

N 

p(Y ;  x(n0))  =  JI  Pv{y(i)  -  h(x(i)))  (A.2) 

x=M 
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(A.3) 


N 

=  n  pv(y(i)  -  h{f~n°(x(n  o)))), 

i-M 

and  therefore 

N 

logp(y;a?(ra0))  =  JZ  log  pv(y(i)  -  h(f-n°(x(n0)))).  (A.4) 

i=M 

To  establish  the  Cramer- Rao  bound  for  this  problem  scenario,  we  need  to  determine  the 
Fisher  information  matrix  J(xno),  the  general  form  of  which  is  given  by 

J(xno)  =  EY-x o  {-Dj(n o)  {^ogp{Y\xna)}  Dx{no)  {logp(F;«no)}}  (A.5) 

where  {log p(Y;xno)}  denotes  the  derivative  of  logp(y ; x(no))  taken  with  respect  to 

x(tiq)  and  evaluated  at  xno.  Applying  the  chain  rule  of  vector  differentiation  to  (A.4)  yields 

Dx(no)  (logp(y;x„0)}  = 

£  DflogpvWi))}  £>*(,»)  {k(/i-’“(^))}  (A.6) 

i=M 

where  v(i )  =  y(i)  -  h(fl~n(x(n0))).  Note  that  Dx(no)  {A(/s~n°(a:(tto)))}  is  expressible  as 
a  product  of  the  derivative  of  /  or  f~l,  specifically 

i?x(no){M/i"n0(®K)))}  = 

D  (*(no)))}  D{f(x(i  -  1))}  •  •  .D{f(x(n0))}  (A.7) 

for  i  >  no  +  2,  as 

DX(  n0)  {h(f-n(x(nQ)))}  = 

D  {h(f-n°  (*(no)))}  D{f-\x(i  +  1))}  -  -  •  D{r\x(no))}  (A.8) 

for  i  <  no  —  2,  and  with  analogous  expressions  for  no  —  1  <  i  <  uq  +  1. 

With  the  appropriate  substitutions,  the  Fisher  information  matrix  becomes 

J(XU 0)  =  f;  jr  Z>J(no)  {A(/i_n°(®n0))}  Q{}->i)Dx(no)  {^(/i‘n°(*no))}  (A.9) 

i=M  j=M 
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where  the  matrix  Q(i,j)  is  given  by 


Q(*,j )  =  Ey;x o  { DT  {log pv{v(i))}  D  {logpy(u(j))}}  .  (A. 10) 

This  decomposition  effectively  decouples  the  dependence  of  J( xno)  on  the  statistics  of  the 
noise  v(n )  and  the  dynamics  of  the  system  /,  with  the  noise  statistics  reflected  in  the 
matrices  When  the  observation  noise  is  also  Gaussian  with  covariance  matrix  R, 

Q(i,j)=R6itj  (A. 11) 


where  =  1  if  i  =  j  and  0  otherwise,  and  J(xno)  reduces  to  the  following: 

=  E  (A.12) 

i=M 


A. 2  Barankin  Bound  for  Vector- Valued  Parameters 

The  Barankin  bound  for  unbiased  estimators  of  scalar- valued  parameters  given  by  (5.19) 
has  a  counterpart,  derived  in  [58,  59],  for  unbiased  estimators  of  vector-valued  parameters. 
For  the  problem  scenario  of  interest  in  Chapter  5  in  which  we  seek  to  bound  P(x(no)),  the 
error  covariance  matrix  for  the  unbiased  estimator  x(nQ)  for  xno  given  the  observation  set 
y,  the  general  form  of  the  Barankin  bound  is  the  following: 

P(x(n0))  >  X  Q-1  XT .  (A.13) 

In  this  equation,  X  is  the  Af  x  m- matrix  given  by 

X  [*l(no)  *®no  ?  *2(^0)  ®no  t  '  '  '  i  ®m(^o)  ®no]i  (A. 14) 

where  {xifno)}^,  the  test  points,  denote  values  of  sc(n0)  other  than  xno.  Also,  Q  is  an 
m  x  m-matrix  with  ijth  element  given  by 

Qij  =  EY-,xno{L(Y-,Xi(n0),xno)L{Y;xj{n0),xno)}  (A.15) 
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J  L(Y ;  Xi(no):  ®no )  -^(F,  Xj(n  o),  xno )  p(Y ,  xno )  dY 


(A-16) 


where  L(Y -,Xk(no)iXno)  is  the  likelihood-ratio  given  by 


L(Y ;  ®fc(no),  xno ) 


p(Y-,xk{n0)) 
p(Y-,xn  0) 


(A.17) 


As  shown  in  [59],  by  appropriately  augmenting  the  original  set  of  test  points,  one 
can  derive  a  restricted  form  of  the  above  bound  which  is  expressible  as  the  sum  of  two 
components — one  the  inverse  of  the  Fisher  information  matrix,  and  the  other  a  positive 
semidefinite  matrix  which  depends  on  the  test  points.  In  particular,  this  restricted  form  of 
the  Barankin  bound  can  be  expressed  as 


P(x(n0))  >  J~1(xno)  +  (X  -  J-1(xno)A)A~\X  -  J~l(xno)A)T  (A.18) 

where 


A 

=  B-ATJ-\xm)A 

(A. 19.) 

J(Xn0) 

=  the  Fisher  Information  Matrix 

=  Ey-x o  {-Dj(no)  {log p(Y;xno)}  Dx(no)  {logp(F;  x^)}} 

(A. 20) 

Aij 

r-  f  0  log  p(Y;zno)  r/„  ,  \  ..  ,] 

“  Ey’Xo{  dx(n0,i )  X(y’  o)’  no)j 

(A.21) 

i  =  1,2, •  •  -  ,-V;  j  —  1,2, •  •  •  ,m 

Bij 

=  Ey;x o  {-^(F;  ®i(no), *no) L{Y ; *j(no), *no)} 

(A.22) 

i,j  =  1, 2,  •  •  -,m 

where  x(no )  =  [x(no,l),x(no, 2),-  ■  ■  ,x(no,JY)].  Since  J(xno )  is  the  Fisher  information 
matrix  and  the  second  term  in  (A.18)  is  positive  semidefinite  (as  noted  in  [59]),  this  special 
form  of  the  Barankin  bound  always  provides  a  tighter  bound  on  the  error  covariance  matrix 
than  the  Cramer- Rao  bound. 

For  the  problem  scenario  of  interest  in  Chapter  5  for  the  special  case  in  which  the  output 
transformation  h  is  the  identity  operator,  a  straightforward  derivation  yields  the  following 
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values  for  the  elements  of  A  and  B  : 


A;j  =  (  t  If k-”° (**(«»))  -  (A.23) 

U=M  J 


j  =  1,2, --^m 


Si,  =  exp  I  £  [/*-"»  (*,-(»„))  - 
U=M 

xjR-X  [/*-"o(*j(no))  -  /*-"°(*no)]} 

i,j  =  1,2 


(A.24) 


where  A.j  denotes  the  jth  column  of  A. 


A.3  Weiss- Weinstein  Bound  for  Vector- Valued  Parameters 

Analogous  to  the  Barankin  bound,  the  Weiss- Weinstein  bound  for  estimators  of  scalar¬ 
valued,  random  parameters  given  by  (5.4.2)  has  a  counterpart  derived  in  [90]  for  estimators 
of  vector- valued,  random  parameters.  For  the  problem  scenario  of  interest  in  Chapter  5  in 
which  we  seek  to  bound  Pr(x(0)),  the  error  covariance  matrix  for  estimators  of  the  random 
initial  condition  x(0)  given  the  observation  set  Y,  the  general  form  of  the  Weiss- Weinstein 
bound  is  the  following: 

Pr(x( 0))  >  Z  W1  ZT.  (A. 25) 

In  this  equation,  Z  is  the  J\f  x  m-matrix  given  by 


^  —  [■*!  j  '  '  ' ,  2m]  5 


(A.26) 


with  each  2,-  an  W-dimensional  vector  and  W  is  an  m  x  m-matrix  with  ijth  element  Wij 
given  by 


=  ^f*(o){4(r,*(0))£j(Y,*(0))} 


(A.27) 


where 


£*(Y,*(0)) 
L(Y,xi(  0),*2(0)) 


LSk(Y,  8(0)  4-  zfc,a(0))  -  L1-“{Y,x(Q)  -  zk,  x(0)) 
I1-*(Y,®(0)-zfc,s(  0)) 

pOW  0)) 

p(Y,*2(0)) 


(A. 28) 
(A.29) 
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and  where  0  <  s*  <  1.  In  the  equations  that  follow,  we  set  each  Sk  equal  to  .5  as  suggested 
in  [90]. 

As  with  the  Barankin  bound  although  never  shown  in  the  literature,  with  the  appropriate 
selection  of  test  offsets,  one  can  derive  a  restricted  form  of  the  Weiss- Weinstein  bound  that  is 
expressible  as  the  sum  of  two  components — one  the  inverse  of  the  Fisher  information  matrix 
for  random  parameters,  and  the  other  a  function  of  test  offsets,  the  number  of  observations, 
and  the  observation  noise  intensity.  To  do  so,  one  augments  the  given  set  of  test  offsets 
with  N  additional  offsets  with  zm+j  =  zej  for  j  =  1,2--*, A/",  where  2  is  a 

scalar  and  ej  is  the  jth  unit  vector  in  TZ/7 .  In  the  limit  z  — »  0,  the  Weiss- Weinstein  bound 
approaches  the  following: 

<Pfe(*(0))  >  JR~\xm  +  (Z-  JR-\x( 0))  A)A~1(Z  -  Jr-\x( 0))  A)7  (A.30) 

where 


=  the  Fisher  Information  Matrix  for  oj(0) 

=  £y,*(o)  {£>J(o)  {loSP(y>  *(0))}  Dx{ 0)  {logp(Y,  x(0))}} 

(A.31) 

z 

=  the  Af  x  m-matrix  of  test  offsets 

=  [zi  t  2-2 1  '  ‘  *  t  Zm] 

(A. 32) 

A 

=  B-  AtJr~1(x(0))A 

(A. 33) 

j 

[  d  log  p(Y,  x(0))  ^  } 

-  ErM  0)(  gx.(0) 

(A.34) 

i  =  1,2,  •  •  -  ,A /";  j  =  1,2,  *  •  •,  m 

Bij 

=  £r,x(o){A(Y,x(0))£J(Y,a;(0))} 

(A. 35) 

i,j  =  1,2,  •  •  -,m 


where  x(0)  =  [xi(0),  ar2(0),  •  ■  • ,  xa^(0)]. 

For  the  problem  scenario  of  interest  in  Chapter  5  for  the  special  case  in  which  the 
observation  noise  covariance  matrix  R  equals  a2  Iy  and  the  output  transformation  h  is  the 
identity  operator,  a  straightforward  derivation  yields  the  following  values  for  the  elements 
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of  A  and  B  : 


,  _  _  iM£t\  o  m 

*W»  “  [-5^  {StJ,i(iC(0)  +  Xi) "  •fi(a;(0))]2}. 


(A.36) 


£  [/*(*((>)  + z,Q -/*(*(()))] ! 


-D{/*(*(0))}  +la,°  f°)r 

7 


-  '^IRoF1]'5^  [-5*  {£^W#)  -  *>)  ■ 


■  «  [/*(*(<>)  -  Zj)  -  /*(*( 0)) : 

E  2<7- 

k= 0 


-Wfc(*(o))}  + 


[*o  -  g(0)]J 


«  rn  |>(*(0)-^i)l’5 
VAl)  -  l  piXo.)  |  ' 

w,i)  = 


[p(a(Q)  +  zQp(a:(0)  +  Zj)3'5 
P(*(0)) 


«P  f-o3  D/‘(*(0)  +  *0  -  /*(*( 0)  +  *;)]21 } 
.  OCT  fc=0  J 


+ r 

x  exp 

+  {!? 


f  [p(a(0)  +  zt-)p(x(0)-zj)]- 
1  P(*(  0)) 


+  *)  -  /‘(*(0)  -  *;)]2| ) 
fc=0  J  J 


b(a(o)  -  zQp(a;(o)  +  zj)Y 

p(*(  0)) 


e*P  f-53  El/‘(*(0)  -  *0  -  /*(*«>)  +  z,)]2|  ) 
k= O  J 


+ 

x  exp  — 

p*i)  =  [**? 


[p(iE(0)-  Zj)p(a(Q)-  Zj)]’5 

p(*(o)) 


-h.  Ei/l(*(o)  -  *.)  -  /‘(*(0)  -  ^)]2] ) 

k= 0  J  J 


'p(x(0)-ZiY 

.5 

1  f 

■  p(*(  0))  . 

exp 

8<r2  \ 

■p(*(0)  -Zj)' 

.5 

'  1  f 

.  p(*(  0)) 

exp 

8<tM 

■*(»(0)  -  *•■)  -  /fc(*(o))]2 
f‘(*(0)-^)-/*(x(0))]2 


-jij  {D/‘(*(0)  -  *;)  -  /*(x(0))]2|  (A.37) 


(A. 38) 


(A. 39) 


(A.40) 
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where  A.j  denotes  the  jth  column  of  A. 
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Appendix  B 


Proofs  for  Chapters  6  and  7 


Proof  of  Proposition  1:  Given  a  matrix  of  rational- valued  state  transition  probabilities 
P  =  \pij]i and  a  row  vector  17(0 )  =  [7r1(0),  •  ■  -7rm(0)]  of  nonzero,  rational- valued,  initial 
state  probabilities,  where  m  is  the  number  of  states,  we  synthesize  a  piecewise  linear  Markov 
map  /  using  the  procedure  outlined  in  Section  6.3.  Because  each  initial  state  probability 
7r j  is  rational-valued  and  equals  A (Ij),  the  length  of  the  corresponding  subinterval,  the 
endpoints  of  the  subintervals  Ij  are  rational- valued.  Also,  because  each  state  transition 
probability  pjk  is  rational- valued,  the  slope  Tjk  of  the  affine  transformation  which  maps 
Ijk  onto  Ik  and  which  is  given  by  Tjk  =  is  rational- valued  as  well.  Because  it  is 

rational- valued,  Tjk  =  ^  where  tijk  and  djk  are  integers. 

As  in  the  outline  of  the  synthesis  procedure,  let  ( ejkj,ekj )  denote  the  left  endpoint  of 
this  affine  segment  in  (x,  y)-coordinates,  and  let  (ejk,r,ek,T)  denote  the  right  endpoint,  when 
the  affine  segment  is  treated  as  a  line  segment  in  the  (x,  t/)-plane.  We  now  replace  the  affine 
transformation  mapping  Ijk  onto  Ik,  by  djk  affine  transformations,  with  the  slope  of  each 
transformation  equal  to  djkTjk  =  rijk,  hence  an  integer,  and  the  left  and  right  endpoint 
pairs  of  the  Ith  such  transformation  given  by  {ejk,i  +  and  (ejk,xi  +  for 

^  =  0,1,---,  djk  —  1.  What  we  in  fact  have  done  is  replace  the  original  affine  transformation 
with  djk  affine  transformations  each  of  which  maps  a  subinterval  of  length  A  (Ijk) /djk  onto 
Ik-  Thus,  the  range  of  each  transformation  is  identical  to  that  of  the  original  transformation. 
We  do  this  to  each  of  the  affine  transformations  so  that  each  transformation  of  the  resulting 
map  has  integer  slope.  If  we  seek  to  guarantee  that  the  map  is  an  EMC  map,  we  do  a 
similar  substitution  to  any  affine  segments  with  slopes  of  1  or  -1;  we  are  free  to  specify  the 
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number  of  affine  segment  replacements  for  these  segments. 

We  must  show  that  state  sequences  that  arise  with  the  transformed  map  are  those  of  a 
Markov  chain  with  the  same  TPM  as  that  of  the  original  map.  We  use  a  minor  adaptation 
of  the  proof  provided  in  [61]  for  the  case  in  which  there  is  a  single  affine  transformation 
between  pairs  of  states.  As  in  the  proof  in  [61],  it  is  sufficient  to  show  that  for  each  positive, 
integer  n 

(B.l) 

where 

4b =  ix  '■  x  €  4 *  /(*)  €  4 »  •  •  S  fn(x)  €  4}  (B.2) 


The  sufficiency  of  the  condition  follows  from  two  facts.  First,  by  definition  of  a  Markov 
process,  state  sequences  defined  on  the  partition  {Jj}  are  those  of  a  Markov  chain  if  the 
following  is  true: 


A(4b  —in)  —  A(4) 


A(4b  )  A(4b)  44  —  l*n  ) 

A(4)  A (4 )  A (4_J  • 


However,  if  (B.3)  holds,  (B.l)  holds  as  well.  Second,  using  an  inductive  argument,  one  can 
show  that  (B.l)  implies  (B.3)  as  well.  Therefore,  (B.l)  and  (B.3)  are  equivalent  conditions, 
and  thus  establishing  the  validity  of  (B.l)  is  sufficient  for  establishing  that  state  sequences 
that  arise  with  the  transformed  map  are  those  of  a  Markov  chain. 

To  establish  the  validity  of  (B.l),  we  begin  by  inductively  showing  that 


/(4m -in)  =  4 •••«*>  almost  everywhere 


(B.4) 


whenever  4b  is  nonempty.  Since  f(Iijk)  C  Ijk,  it  follows  that  A <  A (7^).  Also, 
since  4  =  144^  ^  follows  that  /(4)  =  Ufc  /(4'fc)  an<f  hence 

Ij  =  /(4)  =  U  /(4*)-  (B.5) 

k 

Using  the  facts  that  f(Iijk)  ^  f(Uji)  for  k  ^  l  and  the  restriction  of  /  to  4  consists  of  4 
affine  transformations  each  with  slope  n,j,  results  in  the  following  chain  of  equalities: 


A  (Ij) 


a(U  /(4-Oj 


(B.6) 
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k 

(B.7) 

V  r  A^4'fc) 

Jb 

(B.8) 

r.  TijX(iijk ). 

(B-9) 

k 


The  third  equality  follows  from  the  fact  that  we’ve  replaced  the  original  affine  transformation 
with  slope  Tij  which  maps  Iij  onto  Ij  by  dij  affine  transformations  of  slope  each  mapping 
a  subinterval  of  J,j  of  length  onto  Ij.  Since  the  range  of  each  of  the  transformations 

is  identical,  a  subinterval  of  I^k  of  length  X(IiJk)/dlj  is  in  the  domain  of  each  of  the  dij 
transformations.  Equivalently,  the  range  of  f(Iijk)  is  the  same  as  the  range  of  /  restricted 
to  each  of  these  dl3  subintervals  of  Iijk,  and  because  /  is  piecewise  linear,  X{f(Iijk ))  = 
js  thg  length  Qf  the  image  of  /  (with  slope  nij)  restricted  to  one  of  the 

subintervals. 

Now  A  (Ij)  =  Mljk)  and  X(f(Iijkj)  =  TijX{Iijk )  <  A  (Ijk).  Therefore, 

A  (Ij)  =  '£/X{Ijk)  =  (B.10) 

k  k 

can  be  true  only  if  A  {Ijk)  =  TijX(Iijk).  Since  /  is  piecewise  linear  on  ItJ  with  each  of  the  dij 
affine  pieces  having  slope  n ij,  a  domain  of  length  A  (Iij) /dij,  and  range  Ij,  this  is  equivalent 
to 

f(Iijk )  =  Ijk  almost  everywhere  (B.ll) 

To  complete  the  induction,  we  assume  that  for  all  nonempty  40...t-(  the  following  is  true 

/(4~>i)  =  4-,  (B-12) 

for  all  /  <  n  —  1.  Consider  any  nonempty  set  4-«'n-i .  Then  proceeding  as  above,  we 
have  f(Iio...in_i)  C  4-in_1,  ^<1  therefore  A(/(Jio..^11_1))  =  rio!1  A(/io...in_1)  <  A {Iir..in^). 
However,  4...,-B_1  =  Ui»  4~«»-i*»  and  thus  =  U,„  /(4~i»-j*»)-  According  to 

the  induction  hypothesis,  this  implies 

4— *n-l  =  U  /(4—  «n-l*n)  (B.13) 
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and  thus  the  following  is  true: 


V4 i)  =  (B.14) 

in  “'O'l 

—  riOsl  MA)—  »n_l»n  )  (B.15) 

*n 

Using  the  fact  that  =  Ui„  leads  to  the  following  equality 

E  V4  ■•■,)  =  E  r»0«’l  ^(Xfo— »n)  (B.16) 

3*n 

Since  no  individual  term  in  the  right  member  of  the  above  equation  can  exceed  the  corre¬ 
sponding  term  in  the  left  member,  the  equation  is  valid  only  if 

" in)  —  T»Oti  X(IiQ.... in)  (B.17) 

Because  /  is  piecewise  linear  on  J,-0  ,-j  and  using  a  similar  argument  as  above,  we  must  have 

/(4m -»J  =  hii-y.-ir,  almost  everywhere  (B.18) 

This  completes  the  induction.  Now  observe  that 

A(4i,-.„)  =  (B.19) 

T«0«l 

But,  Tij  =  X(Ij)/X(Iij)  whenever  is  nonempty.  Therefore,  B.19  may  be  expressed 

(B.20) 

which  is  sufficient  to  guarantee  that  the  state  sequence  is  that  of  a  Markov  chain.  To  es¬ 
tablish  the  fact  that  the  transition  probabilities  are  the  same  as  those  of  the  original  map, 
we  note  that  the  portion  of  partition  element  Ij  mapped  onto  partition  element  element  Ik 
is  given  by  Ijk  for  both  maps.  Finally,  we  note  that  the  proof  still  holds  if  A  is  replaced  by 
any  probability  measure  on  the  unit  interval  for  which  there  is  a  corresponding  PDF  that 
is  constant  on  each  partition  element.  □ 
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Proof  of  Proposition  2:  Let  (3  be  the  integer- valued,  least  common  denominator  of 
the  endpoints  of  the  affine  segments  and  the  images  of  these  endpoints.  Such  a  (3  exists 
since  these  points  are  all  rational- valued  by  assumption.  Consider  the  (3  element,  uniform 
partition  with  the  set  of  partition  points  P  =  {^}f=0,  with  each  partition  element  having 
length  1/(3.  The  endpoint  of  the  affine  segments  and  their  images  are  all  partition  points 
since  (3  is  the  least  common  denominator  of  these  points.  Thus,  both  the  domain  and  range 
of  each  affine  segment  consist  of  connected  unions  of  partition  elements,  i.e.,  the  domain 
and  range  are  subintervals. 

The  restriction  of  the  map  /  to  the  domain  of  each  affine  segment  is  an  affine  transfor¬ 
mation  and  thus  expressible  as  x(n+  1)  =  f(x(n))  =  ax(n)  +  b  where  a  is  integer- valued  by 
assumption.  The  left  endpoint  of  the  affine  segment  and  its  image  are  given  by  ^  and 
respectively,  for  some  integers  i  and  j,  since  they  are  partition  points.  It  follows  that  the 
endpoints  of  the  other  partition  elements  in  the  domain  of  the  affine  segment  are  given  by 
f°r  some  n.  For  each  such  endpoint  we  have  the  following: 


(B.21) 

(B.22) 

(B.23) 

(B.24) 


Since  j,  a,  k  are  all  integer- valued,  the  image  of  ^  is  also  a  partition  point.  This  condition 
holds  for  the  partition  points  in  the  domain  of  each  affine  segment  and  thus  for  all  partition 
points.  Therefore,  each  partition  point  is  mapped  to  a  partition  point.  Also,  whereas  the 
restriction  of  /  to  the  domain  of  each  affine  segment  is  affine  and  thus  continuous  and  the 
image  of  an  interval  under  a  continuous  one- dimensional  mapping  is  also  an  interval,  each 
partition  element  is  mapped  to  a  union  of  partition  elements. 

For  the  second  part  of  the  proof,  let  P'  be  the  partition  points  for  a  uniform  refinement 
of  the  original  uniform  partition.  Whereas  the  refinement  is  also  a  uniform  partition,  it 
follows  that  P'  =  {^}7-o  f°r  some  integer  7.  Therefore,  7  =  C(3  for  some  integer  C  and  the 
partition  points  are  expressible  as  P'  =  We  can  now  apply  the  first  part  of  the 

proof  to  this  partition,  since  the  endpoints  of  the  affine  segments  are  also  partition  points, 
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with  each  given  by  ^  for  some  i,  and  the  domain  of  each  affine  segment  is  a  union  of 
partition  elements.  □ 


Proof  of  Proposition  3:  a.  We  first  show  that  if  an  EMC  map  /  gives  rise  to  a  Markov 
chain  with  irreducible  TPM,  the  EMC  map  is  ergodic  and  each  subinterval  of  the  unit 
interval  has  nonzero  (invariant)  measure.  If  /  gives  rise  to  a  Markov  chain  with  irreducible 
TPM,  then  /  is  a  class  C  function  as  defined  in  [12]  and  by  Theorem  1  of  the  same  reference 
has  a  unique,  invariant  measure  that  is  absolutely  continuous  with  respect  to  Lebesgue 
measure,  and  thus  has  a  unique,  stationary  PDF  pf-  Furthermore,  this  PDF  is  nonzero  on 
each  subinterval  as  is  easily  verified  as  follows.  As  shown  in  [12],  given  any  Markov  partition 
for  a  a  class  C  function,  the  Frobenius- Perron  operator  restricted  to  PDFs  that  are 
constant  over  each  partition  element  is  a  linear  operator  and  thus  can  be  represented  by  a 
matrix  M.  That  is,  if  p{x)  denotes  a  piecewise  constant  PDF  satisfying  p(x)  =  pj  for  all 
x  €  Ij  and  p  —  [pi,  •  •  •  ,p/v]  is  a  row  vector,  P/(p(x)),  the  Frobenius-Perron  operator  applied 
to  p(x),  is  given  by  p  M.  It  is  straightforward  to  show  that  this  matrix  is  related  to  the  TPM 
P  of  the  Markov  chain  corresponding  to  the  Markov  partition  by  the  following  similarity 
transformation: 

M^DPD'1  (B.25) 

where  D  is  a  diagonal  matrix  with  ith  diagonal  element  given  by  A  (Ij).  If  P  is  irreducible, 
it  has  a  unique,  invariant  probability  vector  77  with  no  zero- valued  elements.  Therefore,  M 
has  a  unique  invariant  row  vector  with  no  zero- valued  elements  given  by  II  D~l.  Because 
it  is  invariant,  this  vector  corresponds  to  a  fixed  point  of  the  Frobenius-Perron  operator  , 
and  thus  its  elements  are  the  constant  PDF  values  of  the  partition  elements  for  the  unique, 
stationary  PDF  of  /.  Since  /  has  a  unique  stationary  PDF  that  is  nonzero  over  the  unit 
interval  (except  possibly  at  the  endpoints  of  partition  elements),  /  is  ergodic  by  Theorem 
[50,  p.55:Theorem  4.2.2]. 

We  now  show  that  if  an  EMC  map  /  is  ergodic  with  respect  to  an  invariant  measure 
pp  with  corresponding  PDF  pf  that  is  nonzero  almost  everywhere,  each  Markov  chain 
it  gives  rise  to  has  an  irreducible  TPM.  Let  denote  the  elements  of  any  Markov 

partition,  {Sj}f=1  denote  the  states  of  the  associated  Markov  chain,  and  P  =  [py]  denote 
the  corresponding  TPM.  If  P  =  [py]  is  not  irreducible,  there  exists  two  states  Si  and  Sj 
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such  that  it  is  impossible  to  ever  get  to  Sj  from  S,,  or  equivalently  p^  =  0  for  all  positive, 
integer- valued  n  where  pfj  is  the  ijih  element  of  Pn ,  the  n-step  TPM.  However,  because 
of  the  relation  between  states  and  partition  elements  this  means  that  /"(/,•)  D  Ij  =  0  for 
all  positive,  integer-valued  n.  Therefore,  for  all  a;  €  It  and  with  liJ  denoting  the  indicator 
function  over  Ij,  the  following  holds: 

J?„;EW))  =  «  (B-26) 

i=0 

<  J  Ii,{x)pf(x)  dx  =  (B.27) 

where  the  inequality  holds  because  pp{li3)  >  0  since  pf{x)  is  nonzero  almost  everywhere. 
However,  the  inequality  is  a  contradiction  of  the  Birkhoff  ergodic  theorem.  Therefore,  P 
must  be  irreducible. 

Note  that  as  a  consequence  of  the  proof,  it  follows  that  for  an  EMC  map,  and  more 
generally  for  an  MC  map,  to  be  ergodic  with  respect  to  an  invariant  measure  having  a 
corresponding  PDF,  the  PDF  can  only  be  zero  over  subintervals  which  are  partition  elements 
in  Markov  partitions. 

b.  We  first  show  that  if  an  EMC  map  /  gives  rise  to  a  Markov  chain  with  a  primitive 
TPM,  the  map  is  exact.  Let  pp  denote  the  stationary  PDF  of  /  and  pf  denote  the  measure 
this  PDF  gives  rise  to.  From  [50,  p.66, Theorem  4.4. l.c],  it  suffices  to  show  that  the  following 
holds  for  each  density  function  of  pp  (where  g  6  Lx{pf )  is  a  density  function  if  it  is 
nonnegative  and  satisfies  f  g(x) dpp(x)  =  1): 

Km  ll-P/  (</)  ~  111  =  Km,  J  I Pj(g(z))  -  1|  dpp(x)  =  0  (B.28) 

where  PJ  is  the  Frobenius-Perron  operator  of  fn.  The  above  condition  means  that  the 
Frobenius-Perron  operator  applied  to  any  density  function  (with  respect  to  pp)  converges 
strongly  to  the  constant  L  Intuitively,  this  means  that  any  initial  PDF  for  the  map  / 
converges  under  the  dynamics  of  /  to  the  stationary  PDF.  As  noted  in  [50,  p.  69],  it  suffices 
to  prove  the  above  condition  for  all  g  in  a  linearly  dense  subset  of  the  set  of  density  functions. 
Because  pp  is  absolutely  continuous  with  respect  to  Lebesgue  measure  on  the  unit  interval,  a 
linearly  dense  subset  of  density  functions  consists  of  all  normalized  characteristic  functions 
kp(x),  where  kp(x)  =  — ^  if  x  €  /?  and  0  otherwise  and  /?  is  a  subinterval  of  the  unit 
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interval. 

Two  facts  which  simplify  the  proof  follow  directly  from  more  general  results  in  [24]. 
First,  any  MC  map  which  gives  rise  to  a  Markov  chain  with  primitive  TPM  has  a  dense 
set  of  eventually  periodic  points.  Second,  if  an  MC  map  gives  rise  to  a  Markov  chain  with 
primitive  TPM,  then  all  Markov  chains  which  the  map  gives  rise  to  have  primitive  TPMs 
as  well.  In  light  of  the  first  fact,  /  gives  rise  to  arbitrarily  fine  Markov  partitions  since 
one  can  find  a  Markov  partition  which  includes  any  given  eventually  periodic  point  x.  In 
particular,  one  starts  with  any  Markov  partition  and  uses  the  finite  set  of  distinct  points 
{/*(r)}  as  additional  partition  points.  In  addition,  the  set  of  subintervals  with  eventually 
periodic  points  of  /  as  endpoints  are  dense  in  the  set  of  all  subintervals  of  the  unit  interval. 
Therefore,  to  verify  (B.28)  for  all  density  functions,  it  suffices  to  verify  it  for  normalized 
characteristic  functions  ka(x),  where  a  a  subinterval  with  eventually  periodic  points  of  / 
as  endpoints. 

The  following  lemma  follows  as  a  consequence  of  [12,  Theorem  3]  which  establishes  the 
relation  used  earlier  between  the  TPM  of  the  Markov  chain  which  arises  from  a  Markov 
partition  and  the  Frobenius- Perron  operator  restricted  to  PDFs  that  are  constant  over  each 
partition  element. 

Lemma:  The  time  evolution  under  the  dynamics  of  /  of  any  piecewise  constant  PDF, 
where  the  endpoints  of  the  subintervals  with  constant  PDF  values  are  eventually  periodic 
points  of  /,  is  uniquely  determined  by  the  time  evolution  of  the  TPM  of  a  Markov  chain. 

Sketch  of  Proof:  Because  the  endpoints  are  eventually  periodic  points,  one  can  find  a 
Markov  partition  with  these  endpoints  among  the  partition  points;  and  the  PDF  is  con¬ 
stant  over  partition  elements.  Therefore,  from  [12,  Theorem  3]  and  as  discussed  earlier 
the  Frobenius- Perron  operator  restricted  to  PDFs  that  are  piecewise  constant  over  parti¬ 
tion  elements  has  a  matrix  representation  which  is  related  to  the  TPM  of  the  corresponding 
Markov  chain  by  a  similarity  transformation  involving  a  diagonal  matrix  (with  the  length  of 
the  partition  elements  as  the  diagonal  terms).  Therefore,  the  time  evolution  of  a  piecewise 
constant  PDF  under  the  dynamics  of  /  is  uniquely  determined  by  the  time  evolution  of  the 
TPM  of  the  Markov  chain.  □ 

Since  /  gives  rise  to  a  Markov  chain  with  primitive  TPM,  it  has  a  unique,  stationary 
density  that  is  nonzero  almost  everywhere  (from  part  a.).  Now  given  any  kQ(x),  one  can 
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find  a  Markov  partition  with  the  endpoints  of  a  as  partition  points.  Because  the  TPM  P  of 
the  corresponding  Markov  chain  is  primitive,  any  vector  of  initial  state  probabilities  77(0) 
converges  to  the  unique,  invariant  probability  vector,  which  we  denote  77,  of  the  TPM. 
Let  17(0)  be  chosen  such  that  7Tj(0),  the  initial  probability  for  the  state  corresponding  to 
partition  element  7,  ,  is  given  by 

7Tj(0)  =  [  ka(x)dx  =  -A%~r--  (B.29) 

Jlj  X(a) 

As  defined,  17(0)  is  the  probability  vector  corresponding  to  the  density  function  ka(x)  for 
the  chosen  Markov  partition.  In  addition,  we  know  from  the  discussion  in  part  (a.)  that  the 
Frobenius-Perron  operator  restricted  to  PDFs  that  are  piecewise  constant  over  this  Markov 
partition  has  the  matrix  representation  M  given  by  (B.25).  Since  17(0)  converges  to  17, 
then  ka(x )  converges  to  17  D-1 ,  where  D  is  a  diagonal  matrix  with  jth  diagonal  element 
given  by  A(/j).  However,  by  the  uniqueness  of  the  stationary  density  of  /,  17  D~l  must 
equal  this  density.  Therefore,  the  initial  PDF  ka(x)  converges  pointwise  to  the  unique 
stationary  PDF  of  /.  The  convergence  is  also  convergence  in  Lx{hf),  i.e.,  (B.28)  holds 
with  g(x)  =  ka(x),  because  PJ(kQ)  is  piecewise  constant  for  each  n,  with  a  finite  number  of 
pieces  no  greater  than  the  number  of  elements  in  the  chosen  Markov  partition.  Since  a  is 
arbitrary,  the  result  holds  for  each  subinterval  with  endpoints  given  by  eventually  periodic 
points  of  /.  Therefore,  the  result  holds  for  all  density  functions. 

We  now  show  that  if  /  is  exact  with  respect  to  an  invariant  measure  /r  that  is  nonzero 
over  every  subinterval  of  the  unit  interval,  then  the  TPM  of  each  Markov  chain  it  gives 
rise  to  is  primitive.  Consider  any  Markov  partition  {Ij}  for  /  and  let  Sj  denote  the  state 
associated  with  Ij  of  the  corresponding  Markov  chain.  Since  /  is  exact 

hjnoKm))=l-  (B.30) 

However,  since  Ij  is  a  partition  element,  f(Ij)  is  a  union  of  partition  elements.  Further¬ 
more,  each  partition  element  has  finite  measure  and  the  number  of  partition  elements  is 
finite.  Therefore,  the  exactness  property  must  be  satisfied  for  some  finite  integer  N (j),  i.e., 
n(fN(i)(Ij))  =  1,  and  it  must  also  be  true  that  /iV0)(7j)  =  7  (except  possibly  on  a  set  of 
measure  zero),  where  7  is  the  unit  interval.  In  light  of  the  correspondence  between  partition 
elements  and  states  of  the  Markov  chain,  it  follows  that  each  state  of  the  Markov  chain  is 
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taken  to  every  state  of  the  chain  after  N  time  steps  where  N  is  the  largest  of  the  N  (j ) . 
Therefore,  the  TPM  of  the  Markov  chain  is  primitive.  □ 

Proof  of  Proposition  4:  As  noted  in  the  discussion  before  the  statement  of  the 
proposition,  if  the  sequence  of  affine  parameter  pairs  {(T(i,XML),P(i,XML))}iLo  *s  known, 
then  one  can  determine  the  ML  orbit  segment  As  a  consequence  of  the 

following  lemma,  one  can  determine  this  sequence  by  exploiting  the  relation  between  noise- 
corrupted  orbit  segments  of  MC  maps  and  hidden  Markov  models  (HMMs). 

Lemma:  For  any  MC  map  which  gives  rise  to  arbitrarily  fine  Markov  partitions  and  any 
finite  set  of  observations  Y  —  {y(0}£Lo>  one  can  find  a  Markov  partition  for  which  the 
corresponding  Markov  chain  has  the  property  that  for  an  appropriately  defined  HMM  (with 
the  definition  not  dependent  on  the  observation  sequence),  the  sequence  of  affine  parameter 
pairs  associated  with  the  most  likely  state  sequence  Sml  is  identical  to  the  sequence  of 
affine  parameter  pairs  associated  with  xml •  (The  definition  of  Sml  and  the  sequences  of 
affine  parameter  pairs  associated  with  Sml  and  the  xml  are  provided  in  the  discussion 
before  the  statement  of  the  proposition). 

Proof  of  Lemma:  As  in  Section  7.6,  let  L  denote  the  minimum  number  of  affine  segments 
of  /,  (r,-,/?,)  denote  the  pair  of  affine  parameters  associated  with  the  ith  affine  segment, 
and  A;  denote  the  domain  of  this  segment,  so  that  f(x)  =  r;  x  +  /?,-  if  x  €  A,-.  As  noted 
in  the  section,  one  can  associate  a  sequence  of  N  +  1  affine  segment  domains,  {A(i,x)}£L0 
with  each  initial  condition  x,  and  this  sequence  is  associated  with  a  subinterval  of  initial 
conditions  A(x)  which  is  given  by 

N 

A(x)=  nr'(A(*,a:))  (B.31) 

»'=o 

where  f~*  denotes  the  inverse  image  of  the  composed  map  /*.  One  can  show  that  two  such 
subintervals  A(x)  and  A(y)  are  either  identical  or  disjoint  and  these  subintervals  form  a 
partition  of  the  unit  interval,  which  is  in  fact  a  Markov  partition.  Thus  the  subintervals 
form  a  set  of  equivalence  classes  of  points  on  the  unit  interval.  Let  Aeq  —  {A(x,)}  denote 
a  complete  set  of  these  equivalence  classes,  i.e.,  the  {A(®,-)}  are  disjoint  subintervals  whose 
union  equals  the  unit  interval.  There  are  LN+1  sequences  of  affine  domain  segments  not  all 
of  which  are  associated  with  initial  conditions.  As  such  there  are  at  most  LN+1  equivalence 
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classes  in  Aeq ■  It  follows  from  the  discussion  in  the  section  that  the  restriction  of  the 
log-likelihood  function  (7.50)  to  each  equivalence  class  in  Aeq  (when  treated  as  a  function 
of  the  initial  condition  a;)  is  a  quadratic  function  in  x  since  the  same  sequence  of  N  +  1 
affine  parameter  pairs  is  associated  with  each  point  in  an  equivalence  class. 

Let  LF(A(xi ))  denote  the  supremum  (generally  maximum)  value  of  the  log-likelihood 
function  log p(Y ;  x)  given  by  (7.50)  restricted  to  x  €  A(xi).  By  definition,  the  log-likelihood 
function  attains  its  maximum  value  on  the  unit  interval  at  xml-  It  follows  that  this  value  is 
also  the  largest  of  the  {LF(A(x{))}  and  is  associated  with  that  subinterval  A(xj)  which  con¬ 
tains  xml-,  so  that  A(xml)  =  A(xj).  Let  6m in  denote  the  difference  between  LF(A(xml)) 
and  the  next  largest  of  the  {LF{A(xi))}.  In  other  words,  6m in  represents  the  smallest  dif¬ 
ference  between  the  value  of  the  likelihood  function  for  x ml  and  the  value  of  the  likelihood 
function  for  all  other  initial  conditions  on  the  unit  interval  with  associated  sequences  of 
affine  segment  domains  which  differ  from  the  sequence  associated  with  xml- 

Now  consider  any  Markov  partition  for  /  and  its  associated  Markov  chain.  Also,  con¬ 
sider  the  HMM  model  introduced  in  Section  7.6  which  uses  the  state  transition  pseudo¬ 
probabilities  {fty},  the  initial  state  pseudo-probabilities  rj(Sj)  —  1,  and  for  which  the  output 
PDF  p(y\Sj)  associated  with  each  state  Sj  is  given  by 


P(y|Si)  = 


(2tT<7  2)1/2  * 


(B.32) 


where  Hj  6  Ij  and  Ij  is  the  partition  element  associated  with  Sj. 

A  fact  which  follows  from  the  discussion  in  the  chapter  is  that  there  is  a  state  sequence 
Sx  =  {S*(t)}£L0  associated  with  each  initial  condition  x ,  where  Sx(i)  =  Sj  if  f\x)  €  Ij.  In 
addition,  given  any  state  sequence  S  =  {S(i)}£L0  with  nonzero  probability,  i.e.,  P{S)  >  0, 
there  is  an  initial  condition  (actually,  a  subinterval  of  initial  conditions)  x  associated  with 
S,  in  the  sense  that  Sx  =  S  which  means  that  Sx(i)  =  S(i)  for  0  <  i  <  N.  It  follows  from 
this  that  for  any  state  sequence  S  for  which  q(S,Y)  >  0,  where  q(S,Y)  is  the  joint  pseudo¬ 
likelihood  of  the  state  sequence  and  observation  set,  there  exists  some  initial  condition  x 
such  that  Sx  =  S. 

For  arbitrary  initial  condition  x,  we  now  derive  an  upper  bound  on  the  absolute  difference 
\q(Sx,  Y)  —  p(Y ;  x)l,  which  can  be  thought  of  not  only  as  the  absolute  difference  of  the  joint 
pseudo-likelihood  of  the  state  sequence  associated  with  x  and  the  likelihood  of  x,  but  also  as 
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the  absolute  difference  of  the  joint  pseudo-likelihood  of  a  state  sequence  and  the  likelihood 
of  any  initial  condition  associated  with  that  state  sequence.  In  light  of  the  close  relation 
between  q(Sx,  Y )  and  the  log-likelihood  function  (7.34),  what  we  in  fact  are  upper  bounding 
is  the  difference  between  the  value  of  the  likelihood  function  for  the  pseudo-orbit  segment 
{H(Sx(i))}fL0  and  any  actual  orbit  segment  with  associated  state  sequence  Sx  (where 
H(Sx(i ))  =  Hj  if  Sx(i)  =  Sj).  We  now  show  that  such  an  upper  bound  is  provided  by 
1MAX  which  is  given  by 


7 MAX  = 


=  (2  7T  0-2)(^+1)/2 


(B.33) 


where  e  is  the  length  of  the  longest  partition  element. 

To  establish  the  validity  of  (B.33),  we  first  note  that  for  a  fixed  y,  the  Gaussian  PDF 
p(y;  x)  given  by 


p{y;x)  = 


exp  - 


(2  TO2)1/2 


(B.34) 


has  a  slope  attaining  its  maximum  absolute  value  at  x  =  y±a,  with  this  maximum  absolute 
value  p'MAX  given  by 

exP[~|] 

Pmax  ~  (2jt^)1/2-  (B'35) 

Therefore,  for  any  two  points  x\  and  x?  with  \xi~X2\  <  e,  the  absolute  difference  \p(y;  xi)  — 
p(y;x 2)|  is  upper  bounded  by  p'max  €-  addition,  one  can  show  that  for  positive  real 
numbers  a ,  b ,  and  c  with  a  >  b,  and  positive  integer  n,  [(a  +  c)n  —  (b  +  c)n]  >  [a"  —  bn}. 
Since  the  maximum  value  of  p(y,x)  is  ^ttct2)-1/2  (attained  at  x  =  y),  it  follows  that 
for  a  fixed  sequence  {y(i)}^0  and  variable  sequences  {xi(i)}£L0  and  {x2(0)£Lo  satisfying 
|*i(*)-*2(*)l  <  e  f°r  each  i,  an  upper  bound  on  the  absolute  difference  |  fliL)  KKOkiCO)- 

n&oP(»(j)la?2(i))l is  given  by 


IlK»(*)l*i(0)-nK»(i)l*2(i))l  <  1  (Lyili 


1-1- 


e  exp[—  j] 


(2xcr2)(A^+i)/2 


(B.36) 


(B.37) 


However,  the  product  I"I*LoKlf(*)»  xi(*))  is  the  likelihood  of  a  state  sequence  if  for  each 
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i,  zi(i)  =  Hj  for  some  j.  Similarly,  the  product  Y[f=oP(y(j)^  x2{j))  is  the  value  of  the 
likelihood  function  when  {^(O)  is  an  orbit  segment.  Therefore,  (B.37)  provides  an  upper 
bound  on  |g(Sx,Y)  —  p(Y;  x) j  as  well. 

Because  7 max  is  an  increasing  function  of  e  and  /  gives  rise  to  arbitrarily  fine  Markov 
partitions  (by  assumption),  we  can  find  a  Markov  partition  with  small  enough  e  such  that 

27 MAX  <  $MIN-  (B.38) 

where  6  min  was  defined  earlier  as  the  smallest  difference  between  the  value  of  the  likelihood 
function  for  xml  and  the  value  of  the  likelihood  function  for  all  other  initial  conditions  on 
the  unit  interval  with  associated  sequences  of  affine  segment  domains  which  differ  from  the 
sequence  associated  with  xml- 

Given  such  a  Markov  partition  and  the  Markov  chain  corresponding  to  the  partition, 
consider  q(SXMi,  Y),  the  joint  likelihood  of  the  observation  set  and  state  sequence  associated 
with  xml-  Because  SXML  is  the  state  sequence  associated  with  xml  and  (B.38)  holds,  it 
follows  that 


q(5XMt,Y) 


> 

> 


p(Y ;  £ml)  -  7 max 

f\r  -  '  &M1N 

p(Y ;  xml) - ~  • 


(B.39) 

(B.40) 


Now  let  Sbig  denote  the  state  sequence  which  maximizes  the  joint  likelihood  q(S,  Y)  among 
those  state  sequences  for  which  the  associated  sequence  of  affine  parameter  pairs  is  different 
from  the  sequence  associated  with  SXML.  As  noted  earlier,  if  <?(Sbk?,Y)  >  0  then  Sbig  is 
associated  with  at  least  one  initial  condition  2  of  /.  However,  because  the  same  sequence 
of  affine  parameter  pairs  is  associated  with  both  Sbig  and  z  and  this  sequence  is  different 
from  the  sequence  associated  with  xml-,  the  following  must  be  true 


p(Y;z)  <  p(Y;xml)  ~  f>MlN  (B-41) 

by  definition  of  Smin-  Therefore,  we  have  the  following  chain  of  inequalities: 

q(SBlG,Y)  <  p(Y;  z)  +  7m ax  (B.42) 

<  p(Y ;  xml)  -  &min  +  7 max  (B.43) 
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(B.44) 

(B.45) 


< 


p{Y ;  xml )  -  6m  in  + 


6  MIN 
2 


p(Y ;  xml ) 


&MIN 

2 


Combining  (B.40)  and  (B.45)  yields  the  following: 


q(SiML,Y) 


>  p{Y\£ml) 


&MIN 

2 


>  <i{Sbig,Y) 


(B.46) 

(B.47) 


The  inequality  means  that  with  the  chosen  Markov  partition  and  with  any  refinement  of 
the  partition,  the  state  sequence  associated  with  xml  has  a  greater  joint  likelihood,  or 
equivalently  a  larger  likelihood,  than  the  likelihood  of  any  other  state  sequence  for  which 
the  sequence  of  associated  affine  parameter  pairs  differs  from  that  associated  with  xml- 
Therefore,  the  sequence  of  affine  parameter  pairs  associated  with  the  state  sequence  with 
largest  likelihood  must  be  identical  to  that  associated  with  xml- 

An  important  observation  is  that  this  result  does  not  imply  that  SXML  is  the  most  likely 
state  sequence.  The  sequence  of  affine  parameter  pairs  associated  with  SXML  is  associated 
with  other  state  sequences  as  well,  and  one  of  these  state  sequences  may  be  the  most  likely 
state  sequence.  □ 

In  light  of  the  lemma,  one  can  find  the  sequence  of  affine  parameter  pairs  associated  with 
%ML  by  first  finding  a  sufficiently  fine  Markov  partition  and  its  corresponding  Markov  chain 
and  then  determining  the  most  likely  state  sequence  for  an  appropriately  defined  HMM  and 
the  sequence  of  affine  parameter  pairs  associated  with  this  state  sequence.  If  the  partition 
is  fine  enough,  this  sequence  of  affine  parameter  pairs  will  be  the  same  as  the  sequence 
associated  with  xml-  a 
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